SlideShare a Scribd company logo
1 of 39
BASIC XML SYNTAX
 XML markup describes and provides structure to the content of an XML document
or data packet.
 The tag markup syntax of XML is very similar to HTML (both are based upon SGML), with angle
brackets used to delimit tags.
 All tags begin with a less-than sign (<) and end with a greater-than sign (>).
 Unlike HTML, XML is case-sensitive, including element tags and attribute values, that
is:<Invoice> ( <INVOICE>
 ( <invoice> ( <INvoice>
 Characters
 Because XML is intended for worldwide use, characters are not limited to the 7-bitASCII
character set. XML uses most of the characters that are defined in the 16-bit Unicode character
set (currently congruent with ISO/IEC 10646).There are two Unicode formats that are used as
the basis of XML characters: UTF-8 and UTF-16. XML allows the use of almost any character
encoding that can be mapped to Unicode (such as EBCDIC, Big5, etc.).There are numerous
other character encodings that can be used with some XML tools, but UTF-8 and UTF-16
support is required of all XML processors.
 The current Unicode specification can be found at: http://www.unicode.org, and
ISO/IEC 10646 documentation can be ordered at http://www.iso.ch.The UTF
acronym can mean "UnicodeTransformation Format" (according to Unicode), or
 "UCSTransformation Format" (in IEC or IETF documents) - essentially they mean the
same thing, since Unicode and ISO/IEC 10646 are nearly identical.
 UTF-8 is commonly used in North America and Europe, since the first 128
character values map directly to 7-bit US-ASCII (conversely any 7-bit ASCII string is
valid UTF-8). UTF-8 is a multi-byte encoding, with character values represented in
one to six bytes.This encoding is less popular in Asia, since most Asian characters
and ideographs require the longest encoded forms.
 UTF-8 is described at: http://www.ietf.org/rfc/rfc2279.txt The UTF-16 encoding uses 16-bit
values for characters, with the full range of 65,536 possible 16-bit values being split into
two parts.There are 63,486 values available to represent single 16-bit character values.
The other 2,048 values are reserved to provide paired 16-bit code values for an additional
1,048,544 character values.These are called surrogate pairs, but so far none of these
values are being used.
 UTF-16 is described at: http://www.ietf.org/rfc/rfc2781.txtThese are relatively new
standards, and so much of the world's text isn't yet stored in Unicode. However, it was
designed to be a superset of most existing character encodings, and so the conversion of
legacy data to Unicode is straightforward. For example, convertingASCII to the UTF-16
form of Unicode merely requires stuffing a zero into the high-order byte of the 16-bit
character, and simply preserving the low-order byte as is. Of course, this means that twice
the storage space is required, compared to the same text in ASCII.As noted above, 7-bit
ASCII doesn't even need conversion to be treated as the UTF-8 encoding
SPECIAL MARKUP CHARACTERS
 Five characters have special meaning in XML mark-up:
 < - Less-than sign (left angle bracket)
 > - Greater-than sign (right angle bracket)
 & - Ampersand
 ' - Apostrophe (single quotation mark)
 " - Quotation mark (double quotation mark)
 Use &lt; for <
 Use &gt; for >
 Use &amp; for &
 Use &apos; for ‘ and Use &quot; for "
ELEMENTS
 An element is XML's basic container for content - it may contain character data,
other elements, and/or other markup (comments, PIs, entity references, etc.).
Since they represent discrete objects, elements can be thought of as the "nouns"
of XML.
 Elements are delimited with a start-tag and an end-tag. If an element has no
content, it is known as an empty element, and may be represented with either a
start-tag/end-tag pair or using an abbreviation: the empty-element tag. Unlike
the looser syntax of HTML and SGML, the end-tag cannot be omitted, except
when using an empty-element tag.
 All three types of tags are shown in this example:
 <html> <!-- start-tag -->
 <img src="logo.png" /> <!-- empty-element tag -->
 </html> <!-- end-tag -->
 Each of these tags consists of the element type name (this must be a valid XML
name) enclosed within a pair of angle brackets
 (< >). Let's look at XML tags in more detail.
 <html> <!-- start-tag -->
 <img src="logo.png" /> <!-- empty-element tag -->
 </html> <!-- end-tag -->
TAGS
 The opening delimiter of an element is called the start-tag. Start-tags are comprised of an element type name, and
perhaps some attributes (which we'll look at later in this chapter), enclosed within a pair of angle brackets.
 We can think of start-tags as "opening" a container - which is then "closed" with an end-tag. End-tags are comprised of a
forward slash (/) followed by an element type name, enclosed within the usual angle brackets.
 The name in an end-tag must match the element name in a corresponding start-tag. Everything between the start-tag
and the end-tag of an element is contained within that element.The following are legal pairs of start- and end-tags:
 <Invoice> ... </Invoice>
 <INVOICE> ... </INVOICE>
 <INVOICE > ... </INVOICE >
 <Wrox:Invoice> ... </Wrox:Invoice>
EMPTY-ELEMENTTAGS
 Empty elements are those that have no content, though there may be associated
attributes. Let's say that we wanted to explicitly indicate certain points within our
XML data (see the next section). We could just add a start- and end-tag pair
without any text between
 for example:
 <point></point>
THE STRUCTURE OF XML DATA
 All XML data must conform to both syntax requirements and a simple container
structure. Such data is known as well formed (see relevant section later in this
chapter for more details). All well-formed XML documents can be comprised of
one to three parts:
 An optional prolog, which may contain important information about the rest of
the data. The body, which consists of one or more elements in the form a
hierarchical tree. An optional "miscellaneous" epilog that follows the element
tree.These parts, and the unfamiliar syntax in the following illustration, will be
described in greater detail later in this chapter.
 Prolog
 <?xml version="1.0"?>
 <!-- Comments and/or PIs allowed here -->
 <!DOCTYPE textfile SYSTEM "http://www.mySite.com/MyDTDs/Textfile.dtd">
 <!-- Comments and/or PIs allowed here -->
 <textfile>
 <line>A Simple Example</line>
 <line> byYoursTruly</line>
 <line>This is the 3rd line of a simple 5-line text file.</line>
 <line>..the middle line..</line>
 <line>And lastly, a final line of text.</line>
 <EOF/>
 </textfile>
 The body sub-tree always has a single root node called the document element (sometimes referred to as the root
element) - if not,
 the data is not well-formedXML!
 Any well-formedXML document must be a simple hierarchical tree with a single root node, called the "document root".
This
 document tree contains a secondary tree of elements, with its own singular root node, called the "document element".
 The document root of each XML document is also the main point of attachment for the document's description using a
DTD or
 Schema (see Chapters 5 and 6 for more about these). A Processing Instruction (PI - more about these later) is often used
to attach
 a stylesheet as well (see Chapter 9).
 Since well-formedXML data has a tree structure, it can be modeled and manipulated as a tree. A standard model for this
 approach is theW3C Document Object Model (DOM), which will be discussed in Chapter 11.
 Now let's look at the body of the XML document in greater depth
 The Document Element
 This element is the parent of all other elements in the tree, and thus it may not be
contained in any other element. Because the
 document root and the document element are not the same thing, it is better not
to refer to the document element as the "root
 element" (even though it is the root of the element sub-tree)
 String Literals
 String literals are used for the values of attributes, internal entities, and external
identifiers. All string literals in XML are
 enclosed by delimiter pairs, using either an apostrophe (') or a quotation mark (").
The one restriction upon these literals is that
 the character used for the delimiters may not appear within the literal - if an
apostrophe appears in the literal, the quotation mark
 delimiter must be used, and vice versa.
 "string"
 'string'
 "..Jack's cow said &quot;moo&quot;"
 '..Jack&apos;s cow said "moo"'
ATTRIBUTES
 If elements are the "nouns" of XML, then attributes are its "adjectives".
 Often there is some information about an element that we wish to attach to it, as
opposed to including it as a string inside the
 element, or one of its children.This can be done using attributes, each of which is
comprised of a name-value pair. Both starttags
 and empty-element tags may include attributes within the tag. Attribute values
must always be string literals, so the attribute
 value can use either of the two delimiters:
ELEMENTSVS. ATTRIBUTES
 The decision to use an element versus an attribute is not a simple one. Much
discussion and argument has occurred about this
 topic on both the XML-L and XML-DEV lists. Some argue that attributes should
never be used - that they add unnecessary
 processing complexity, and that anything that can be represented as an attribute
would be better contained within a child element.
 Others extol the advantage of being able to validate attribute values and assign
default values using a DTD. Experiments have
 shown that, despite superficial appearances, use of generic data compression
(such as gzip, zlib, or LZW) has shown that neither
 form has an inherent advantage for data storage or transmission.
CHARACTER DATA
 Character data is plain text that contains no element tags or other markup, except perhaps, character and entity
references.
 Remember too, that because XML is intended for worldwide use, text means Unicode, not just ASCII (see the
"Characters"
 section earlier in this chapter).
 The ampersand (&) and less-than (<) characters are used as XML's opening delimiters, and thus may never appear in
their literal
 form (except in CDATA sections, which are discussed later). If these characters are needed within character data, they
must be
 escaped using the entity references; &lt; or &amp;. It is not necessary to escape the other markup characters (like >), but
they
 may be escaped (using &gt; in this case), if only for the sake of consistency within the character data.
 These escape sequences are part of the set of five such strings defined by the XML specification, and implemented in all
 compliant XML parsers.
WHITESPACE
 Whitespace is an important linguistic concept for both human and computer
languages. Only four characters are treated as whitespace in XML data:
 XML's rule for handling whitespace is very simple: all whitespace characters
(except for the CR character) within the content are preserved by the parser and
passed unmodified to the application, while whitespace within element tags and
attribute values may be removed.This is unlike the rampant removal of
whitespace carried out in HTML browsers.
SPECIAL-PURPOSE MARKUP
 We've already discussed just about every aspect of XML syntax that is necessary
to create well-formed XML data (elements,attributes, and character/entity
references). There are three additional syntactic constructs that deviate from the
familiar syntax of tags (<tagname>) or entity references (&ref;).These are:
 Comments
 Processing Instructions (PIs)
 CDATA sections
COMMENTS
 It is often useful to insert notes, or comments, into a document.These comments
might provide a revision log, historical notes, or any other sort of meta-data that
would be meaningful to the creator and editors of a document (serving to
enhance its human readability), but aren't truly part of the document's content.
Comments may appear anywhere in a document outside of other markup (that is,
you can't put a comment in the middle of a start- or end-tag).
 The basic syntax of an XML comment is:
 <!--...comment text...-->
PROCESSING INSTRUCTIONS (PIS)
 XML, like SGML, is a descriptive markup language, and so it does not presume to
try to explain how to actually process an
 element or its contents.This is a powerful advantage in that it provides
presentation flexibility, and OS- and applicationindependence.
 However, there are times when it is desirable to pass processing hints (or perhaps
some script code) to the
 application along with the document.The Processing Instruction (PI) is the
mechanism that XML provides for this purpose.
CDATA SECTIONS
 CDATA sections are a method of including text that contains characters that
would otherwise be interpreted as markup.This feature is primarily useful to
authors who wish to include examples of XML markup in their documents (like the
examples in this book).This is probably the only good reason to include CDATA
sections in a document, since almost all advantages of XML are lost when using
these sections.
 The basic syntax of a CDATA section is:
 <![CDATA[...]]>
 <![CDATA[&Warn; - &Disclaimer; &lt;&copy; 2001 &USCG; &amp; &USN; &gt; ]]>
 <example>&amp;Warn; - &amp;Disclaimer; &amp;lt;&amp;copy; 2001
&amp;USCG; &amp;amp; &amp;USN;
 &amp;gt;
 </example>
DOCUMENT STRUCTURE
 Prolog
 The prolog is the appetizer - used to signal the beginning of XML data. It describes the data's
character encoding, and provides some other configuration hints to the XML parser and
application.
 XML Declaration
 All XML documents should begin with an XML Declaration.This declaration is not required in
most XML documents, but it serves to explicitly identify the data as XML, and does permit
some optimizations when processing the document. If the XML data uses an encoding other
than UTF-8 or UTF-16, then an XML Declaration with the correct encoding must be used.
 If this declaration is included, then the string literal "<?xml " must be the very first six characters
of the document – no preceding whitespace or embedded comments are allowed.
 While this declaration looks exactly like a processing instruction, strictly speaking
it is not a PI (it is a unique declaration defined by the XML 1.0 REC). Nevertheless,
the XML Declaration uses PI-like delimiters and an attribute-like parameter syntax
that is similar to the one used in element tags (either " or ' may be used to delimit
the value strings). For example:
 <?xml version="1.0" encoding='utf-8' standalone="yes"?>
 <?xml version='1.0' encoding='utf-8'?>
DOCUMENTTYPE DECLARATION
 This should not be confused with the DTD (Remember: DocumentType
Definition)! Rather, the DocumentType Declaration can refer to an external DTD
and/or contain part of the DTD.
 Body
 This is, of course, the main course of the XML data, which we've discussed at
length in terms of its components: elements,
 attributes, character data, etc. It is worth reiterating that the body may contain
comments, PIs, and/or whitespace characters
 interleaved with elements and character data.The elements must comprise a
hierarchical tree, with a single root node.
EPILOG
 The XML epilog is the dessert with potentially unpleasant consequences! It may
include comments, PIs, and/or whitespace.Comments and whitespace don't cause
any significant problems. However, it is unclear whether PIs in the epilog should
be applied to the elements in the preceding XML data, or a subsequent XML
document (if any).This may well be a solution in search of a problem, or it may
just be a problem in and of itself. XML does not define any end-ofdocument
indicator, and many applications will use the document element end-tag for this
purpose. In this case, the epilog is never read, let alone processed.
 This is a "real design error" as considered byTim Bray (one of the XML 1.0 REC
editors). It is probably inadvisable to use it without a very compelling reason - and
the prior knowledge that it will likely not be interoperable with other XML
applications.
VALID XML
 Any XML data object is considered valid XML if it is well formed, and it meets
certain further validity constraints and matches a grammar describing the
document's content. Like SGML, XML can provide such a description of document
structure in the form of an XML Schema or a DTD
 The SGML equivalent of a well-formed document is known as tag-valid.The SGML
equivalent of a valid document is type-valid
XML PARSERS
 In addition to specifying the syntax of XML, theW3C described some of the
behavior of the lower tier of XML's client architecture (the XML processor or
parser)
 Parser Levels
 Two levels of parser ("processor") behavior are defined in the XML 1.0 REC:
 Non-validating - ensures that the data is well-formed XML, but need not resolve
any external resources
 Validating - ensures both well-formedness and validity using a DTD, and must
resolve external resources
 Parser Implementations
 There are two different implementation approaches to processing the XML data:
 Event-driven parser - Processes XML data sequentially, handling components one at a
time
 Tree-based parser - Constructs a tree representation of the entire document and provides
access to individual nodes in
 the tree (can be constructed on top of an event-driven parser)
 Much quasi-religious argument has occurred about this dichotomy, but each approach has
its merits. Like so many other realworld
 problems, XML processing may have vastly different requirements, and thus different
approaches may be best for
 different situations.
EVENT-DRIVEN PARSERS
 The event-driven model should be quite familiar to programmers of modern GUI
interfaces and operating systems. In this case, the XML parser executes a call-
back to the application for each component of the XML data: element (with
attributes), character data, processing instructions, notation, or comments. It's up
to the application to handle the XML data as it is provided via the call-backs - the
XML parser does not maintain the element tree structure, or any of the data after
it has been parsed.The eventdriven method requires very modest system
resources, even for extremely large documents; and because of its simple, low-
level access to the structure of the XML data, provides great flexibility in handling
the data within the XML application.
TREE-BASED PARSERS
 One of the most widely used structures in software engineering is the simple
hierarchical tree. All well-formed XML data is defined to be such a tree, and thus
common and mature algorithms may be used to traverse the nodes of an XML
document, search for content, and/or edit the document tree.These tree
algorithms have the advantage of years of academic and commercial
development.
 XML parsers that use this approach generally conform to the W3C's Document
Object Model (DOM).The DOM is a platform and language-neutral interface that
allows manipulation of tree-structured documents. On the other hand, the DOM
tree must be built in memory, before the document can be manipulated - high-
performance virtual memory support is imperative for larger documents! Once
the tree is built, an application may access the DOM via a related API.

More Related Content

What's hot (20)

Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml Java
Xml JavaXml Java
Xml Java
 
Xml
XmlXml
Xml
 
XML Introduction
XML IntroductionXML Introduction
XML Introduction
 
XML
XMLXML
XML
 
Xml tutorial
Xml tutorialXml tutorial
Xml tutorial
 
XML and DTD
XML and DTDXML and DTD
XML and DTD
 
Xml 215-presentation
Xml 215-presentationXml 215-presentation
Xml 215-presentation
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Extensible Markup Language (XML)
Extensible Markup Language (XML)Extensible Markup Language (XML)
Extensible Markup Language (XML)
 
XML
XMLXML
XML
 
Xml ppt
Xml pptXml ppt
Xml ppt
 
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XMLFergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - Introduction to XML
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 
Xml dtd
Xml dtdXml dtd
Xml dtd
 
XML, DTD & XSD Overview
XML, DTD & XSD OverviewXML, DTD & XSD Overview
XML, DTD & XSD Overview
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
XML | Computer Science
XML | Computer ScienceXML | Computer Science
XML | Computer Science
 
XML-Extensible Markup Language
XML-Extensible Markup Language XML-Extensible Markup Language
XML-Extensible Markup Language
 
Xml
XmlXml
Xml
 

Similar to Basic xml syntax (20)

Xml 1
Xml 1Xml 1
Xml 1
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
 
Web Development Course - XML by RSOLUTIONS
Web Development Course - XML by RSOLUTIONSWeb Development Course - XML by RSOLUTIONS
Web Development Course - XML by RSOLUTIONS
 
xml introduction in web technologies subject
xml introduction in web technologies subjectxml introduction in web technologies subject
xml introduction in web technologies subject
 
Xml
Xml Xml
Xml
 
PHP XML
PHP XMLPHP XML
PHP XML
 
Xml
XmlXml
Xml
 
XML Presentation-2
XML Presentation-2XML Presentation-2
XML Presentation-2
 
Unit 5 xml (1)
Unit 5   xml (1)Unit 5   xml (1)
Unit 5 xml (1)
 
Introduction to xml
Introduction to xmlIntroduction to xml
Introduction to xml
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
 
Xml
XmlXml
Xml
 
Oracle soa xml faq
Oracle soa xml faqOracle soa xml faq
Oracle soa xml faq
 
xml.pptx
xml.pptxxml.pptx
xml.pptx
 
XML.pptx
XML.pptxXML.pptx
XML.pptx
 
Xml
XmlXml
Xml
 
Xml intro1
Xml intro1Xml intro1
Xml intro1
 
XML/XSLT
XML/XSLTXML/XSLT
XML/XSLT
 
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
 
Xml viva questions
Xml viva questionsXml viva questions
Xml viva questions
 

More from Raghu nath

Ftp (file transfer protocol)
Ftp (file transfer protocol)Ftp (file transfer protocol)
Ftp (file transfer protocol)Raghu nath
 
Javascript part1
Javascript part1Javascript part1
Javascript part1Raghu nath
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
Selection sort
Selection sortSelection sort
Selection sortRaghu nath
 
Binary search
Binary search Binary search
Binary search Raghu nath
 
JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)Raghu nath
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
Step by step guide to install dhcp role
Step by step guide to install dhcp roleStep by step guide to install dhcp role
Step by step guide to install dhcp roleRaghu nath
 
Network essentials chapter 4
Network essentials  chapter 4Network essentials  chapter 4
Network essentials chapter 4Raghu nath
 
Network essentials chapter 3
Network essentials  chapter 3Network essentials  chapter 3
Network essentials chapter 3Raghu nath
 
Network essentials chapter 2
Network essentials  chapter 2Network essentials  chapter 2
Network essentials chapter 2Raghu nath
 
Network essentials - chapter 1
Network essentials - chapter 1Network essentials - chapter 1
Network essentials - chapter 1Raghu nath
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2Raghu nath
 
python chapter 1
python chapter 1python chapter 1
python chapter 1Raghu nath
 
Linux Shell Scripting
Linux Shell ScriptingLinux Shell Scripting
Linux Shell ScriptingRaghu nath
 

More from Raghu nath (20)

Mongo db
Mongo dbMongo db
Mongo db
 
Ftp (file transfer protocol)
Ftp (file transfer protocol)Ftp (file transfer protocol)
Ftp (file transfer protocol)
 
MS WORD 2013
MS WORD 2013MS WORD 2013
MS WORD 2013
 
Msword
MswordMsword
Msword
 
Ms word
Ms wordMs word
Ms word
 
Javascript part1
Javascript part1Javascript part1
Javascript part1
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Selection sort
Selection sortSelection sort
Selection sort
 
Binary search
Binary search Binary search
Binary search
 
JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
Step by step guide to install dhcp role
Step by step guide to install dhcp roleStep by step guide to install dhcp role
Step by step guide to install dhcp role
 
Network essentials chapter 4
Network essentials  chapter 4Network essentials  chapter 4
Network essentials chapter 4
 
Network essentials chapter 3
Network essentials  chapter 3Network essentials  chapter 3
Network essentials chapter 3
 
Network essentials chapter 2
Network essentials  chapter 2Network essentials  chapter 2
Network essentials chapter 2
 
Network essentials - chapter 1
Network essentials - chapter 1Network essentials - chapter 1
Network essentials - chapter 1
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2
 
python chapter 1
python chapter 1python chapter 1
python chapter 1
 
Linux Shell Scripting
Linux Shell ScriptingLinux Shell Scripting
Linux Shell Scripting
 
Perl
PerlPerl
Perl
 

Recently uploaded

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Recently uploaded (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Basic xml syntax

  • 2.  XML markup describes and provides structure to the content of an XML document or data packet.
  • 3.  The tag markup syntax of XML is very similar to HTML (both are based upon SGML), with angle brackets used to delimit tags.  All tags begin with a less-than sign (<) and end with a greater-than sign (>).  Unlike HTML, XML is case-sensitive, including element tags and attribute values, that is:<Invoice> ( <INVOICE>  ( <invoice> ( <INvoice>  Characters  Because XML is intended for worldwide use, characters are not limited to the 7-bitASCII character set. XML uses most of the characters that are defined in the 16-bit Unicode character set (currently congruent with ISO/IEC 10646).There are two Unicode formats that are used as the basis of XML characters: UTF-8 and UTF-16. XML allows the use of almost any character encoding that can be mapped to Unicode (such as EBCDIC, Big5, etc.).There are numerous other character encodings that can be used with some XML tools, but UTF-8 and UTF-16 support is required of all XML processors.
  • 4.  The current Unicode specification can be found at: http://www.unicode.org, and ISO/IEC 10646 documentation can be ordered at http://www.iso.ch.The UTF acronym can mean "UnicodeTransformation Format" (according to Unicode), or  "UCSTransformation Format" (in IEC or IETF documents) - essentially they mean the same thing, since Unicode and ISO/IEC 10646 are nearly identical.  UTF-8 is commonly used in North America and Europe, since the first 128 character values map directly to 7-bit US-ASCII (conversely any 7-bit ASCII string is valid UTF-8). UTF-8 is a multi-byte encoding, with character values represented in one to six bytes.This encoding is less popular in Asia, since most Asian characters and ideographs require the longest encoded forms.
  • 5.  UTF-8 is described at: http://www.ietf.org/rfc/rfc2279.txt The UTF-16 encoding uses 16-bit values for characters, with the full range of 65,536 possible 16-bit values being split into two parts.There are 63,486 values available to represent single 16-bit character values. The other 2,048 values are reserved to provide paired 16-bit code values for an additional 1,048,544 character values.These are called surrogate pairs, but so far none of these values are being used.  UTF-16 is described at: http://www.ietf.org/rfc/rfc2781.txtThese are relatively new standards, and so much of the world's text isn't yet stored in Unicode. However, it was designed to be a superset of most existing character encodings, and so the conversion of legacy data to Unicode is straightforward. For example, convertingASCII to the UTF-16 form of Unicode merely requires stuffing a zero into the high-order byte of the 16-bit character, and simply preserving the low-order byte as is. Of course, this means that twice the storage space is required, compared to the same text in ASCII.As noted above, 7-bit ASCII doesn't even need conversion to be treated as the UTF-8 encoding
  • 6. SPECIAL MARKUP CHARACTERS  Five characters have special meaning in XML mark-up:  < - Less-than sign (left angle bracket)  > - Greater-than sign (right angle bracket)  & - Ampersand  ' - Apostrophe (single quotation mark)  " - Quotation mark (double quotation mark)  Use &lt; for <  Use &gt; for >  Use &amp; for &  Use &apos; for ‘ and Use &quot; for "
  • 7. ELEMENTS  An element is XML's basic container for content - it may contain character data, other elements, and/or other markup (comments, PIs, entity references, etc.). Since they represent discrete objects, elements can be thought of as the "nouns" of XML.  Elements are delimited with a start-tag and an end-tag. If an element has no content, it is known as an empty element, and may be represented with either a start-tag/end-tag pair or using an abbreviation: the empty-element tag. Unlike the looser syntax of HTML and SGML, the end-tag cannot be omitted, except when using an empty-element tag.
  • 8.  All three types of tags are shown in this example:  <html> <!-- start-tag -->  <img src="logo.png" /> <!-- empty-element tag -->  </html> <!-- end-tag -->  Each of these tags consists of the element type name (this must be a valid XML name) enclosed within a pair of angle brackets  (< >). Let's look at XML tags in more detail.
  • 9.  <html> <!-- start-tag -->  <img src="logo.png" /> <!-- empty-element tag -->  </html> <!-- end-tag -->
  • 10. TAGS  The opening delimiter of an element is called the start-tag. Start-tags are comprised of an element type name, and perhaps some attributes (which we'll look at later in this chapter), enclosed within a pair of angle brackets.  We can think of start-tags as "opening" a container - which is then "closed" with an end-tag. End-tags are comprised of a forward slash (/) followed by an element type name, enclosed within the usual angle brackets.  The name in an end-tag must match the element name in a corresponding start-tag. Everything between the start-tag and the end-tag of an element is contained within that element.The following are legal pairs of start- and end-tags:  <Invoice> ... </Invoice>  <INVOICE> ... </INVOICE>  <INVOICE > ... </INVOICE >  <Wrox:Invoice> ... </Wrox:Invoice>
  • 11. EMPTY-ELEMENTTAGS  Empty elements are those that have no content, though there may be associated attributes. Let's say that we wanted to explicitly indicate certain points within our XML data (see the next section). We could just add a start- and end-tag pair without any text between  for example:  <point></point>
  • 12. THE STRUCTURE OF XML DATA  All XML data must conform to both syntax requirements and a simple container structure. Such data is known as well formed (see relevant section later in this chapter for more details). All well-formed XML documents can be comprised of one to three parts:  An optional prolog, which may contain important information about the rest of the data. The body, which consists of one or more elements in the form a hierarchical tree. An optional "miscellaneous" epilog that follows the element tree.These parts, and the unfamiliar syntax in the following illustration, will be described in greater detail later in this chapter.  Prolog  <?xml version="1.0"?>
  • 13.  <!-- Comments and/or PIs allowed here -->  <!DOCTYPE textfile SYSTEM "http://www.mySite.com/MyDTDs/Textfile.dtd">  <!-- Comments and/or PIs allowed here -->
  • 14.  <textfile>  <line>A Simple Example</line>  <line> byYoursTruly</line>  <line>This is the 3rd line of a simple 5-line text file.</line>  <line>..the middle line..</line>  <line>And lastly, a final line of text.</line>  <EOF/>  </textfile>
  • 15.
  • 16.
  • 17.  The body sub-tree always has a single root node called the document element (sometimes referred to as the root element) - if not,  the data is not well-formedXML!  Any well-formedXML document must be a simple hierarchical tree with a single root node, called the "document root". This  document tree contains a secondary tree of elements, with its own singular root node, called the "document element".  The document root of each XML document is also the main point of attachment for the document's description using a DTD or  Schema (see Chapters 5 and 6 for more about these). A Processing Instruction (PI - more about these later) is often used to attach  a stylesheet as well (see Chapter 9).  Since well-formedXML data has a tree structure, it can be modeled and manipulated as a tree. A standard model for this  approach is theW3C Document Object Model (DOM), which will be discussed in Chapter 11.  Now let's look at the body of the XML document in greater depth
  • 18.  The Document Element  This element is the parent of all other elements in the tree, and thus it may not be contained in any other element. Because the  document root and the document element are not the same thing, it is better not to refer to the document element as the "root  element" (even though it is the root of the element sub-tree)
  • 19.
  • 20.  String Literals  String literals are used for the values of attributes, internal entities, and external identifiers. All string literals in XML are  enclosed by delimiter pairs, using either an apostrophe (') or a quotation mark ("). The one restriction upon these literals is that  the character used for the delimiters may not appear within the literal - if an apostrophe appears in the literal, the quotation mark  delimiter must be used, and vice versa.
  • 21.  "string"  'string'  "..Jack's cow said &quot;moo&quot;"  '..Jack&apos;s cow said "moo"'
  • 22. ATTRIBUTES  If elements are the "nouns" of XML, then attributes are its "adjectives".  Often there is some information about an element that we wish to attach to it, as opposed to including it as a string inside the  element, or one of its children.This can be done using attributes, each of which is comprised of a name-value pair. Both starttags  and empty-element tags may include attributes within the tag. Attribute values must always be string literals, so the attribute  value can use either of the two delimiters:
  • 23. ELEMENTSVS. ATTRIBUTES  The decision to use an element versus an attribute is not a simple one. Much discussion and argument has occurred about this  topic on both the XML-L and XML-DEV lists. Some argue that attributes should never be used - that they add unnecessary  processing complexity, and that anything that can be represented as an attribute would be better contained within a child element.  Others extol the advantage of being able to validate attribute values and assign default values using a DTD. Experiments have  shown that, despite superficial appearances, use of generic data compression (such as gzip, zlib, or LZW) has shown that neither  form has an inherent advantage for data storage or transmission.
  • 24. CHARACTER DATA  Character data is plain text that contains no element tags or other markup, except perhaps, character and entity references.  Remember too, that because XML is intended for worldwide use, text means Unicode, not just ASCII (see the "Characters"  section earlier in this chapter).  The ampersand (&) and less-than (<) characters are used as XML's opening delimiters, and thus may never appear in their literal  form (except in CDATA sections, which are discussed later). If these characters are needed within character data, they must be  escaped using the entity references; &lt; or &amp;. It is not necessary to escape the other markup characters (like >), but they  may be escaped (using &gt; in this case), if only for the sake of consistency within the character data.  These escape sequences are part of the set of five such strings defined by the XML specification, and implemented in all  compliant XML parsers.
  • 25. WHITESPACE  Whitespace is an important linguistic concept for both human and computer languages. Only four characters are treated as whitespace in XML data:  XML's rule for handling whitespace is very simple: all whitespace characters (except for the CR character) within the content are preserved by the parser and passed unmodified to the application, while whitespace within element tags and attribute values may be removed.This is unlike the rampant removal of whitespace carried out in HTML browsers.
  • 26. SPECIAL-PURPOSE MARKUP  We've already discussed just about every aspect of XML syntax that is necessary to create well-formed XML data (elements,attributes, and character/entity references). There are three additional syntactic constructs that deviate from the familiar syntax of tags (<tagname>) or entity references (&ref;).These are:  Comments  Processing Instructions (PIs)  CDATA sections
  • 27. COMMENTS  It is often useful to insert notes, or comments, into a document.These comments might provide a revision log, historical notes, or any other sort of meta-data that would be meaningful to the creator and editors of a document (serving to enhance its human readability), but aren't truly part of the document's content. Comments may appear anywhere in a document outside of other markup (that is, you can't put a comment in the middle of a start- or end-tag).  The basic syntax of an XML comment is:  <!--...comment text...-->
  • 28. PROCESSING INSTRUCTIONS (PIS)  XML, like SGML, is a descriptive markup language, and so it does not presume to try to explain how to actually process an  element or its contents.This is a powerful advantage in that it provides presentation flexibility, and OS- and applicationindependence.  However, there are times when it is desirable to pass processing hints (or perhaps some script code) to the  application along with the document.The Processing Instruction (PI) is the mechanism that XML provides for this purpose.
  • 29. CDATA SECTIONS  CDATA sections are a method of including text that contains characters that would otherwise be interpreted as markup.This feature is primarily useful to authors who wish to include examples of XML markup in their documents (like the examples in this book).This is probably the only good reason to include CDATA sections in a document, since almost all advantages of XML are lost when using these sections.
  • 30.  The basic syntax of a CDATA section is:  <![CDATA[...]]>  <![CDATA[&Warn; - &Disclaimer; &lt;&copy; 2001 &USCG; &amp; &USN; &gt; ]]>  <example>&amp;Warn; - &amp;Disclaimer; &amp;lt;&amp;copy; 2001 &amp;USCG; &amp;amp; &amp;USN;  &amp;gt;  </example>
  • 31. DOCUMENT STRUCTURE  Prolog  The prolog is the appetizer - used to signal the beginning of XML data. It describes the data's character encoding, and provides some other configuration hints to the XML parser and application.  XML Declaration  All XML documents should begin with an XML Declaration.This declaration is not required in most XML documents, but it serves to explicitly identify the data as XML, and does permit some optimizations when processing the document. If the XML data uses an encoding other than UTF-8 or UTF-16, then an XML Declaration with the correct encoding must be used.  If this declaration is included, then the string literal "<?xml " must be the very first six characters of the document – no preceding whitespace or embedded comments are allowed.
  • 32.  While this declaration looks exactly like a processing instruction, strictly speaking it is not a PI (it is a unique declaration defined by the XML 1.0 REC). Nevertheless, the XML Declaration uses PI-like delimiters and an attribute-like parameter syntax that is similar to the one used in element tags (either " or ' may be used to delimit the value strings). For example:  <?xml version="1.0" encoding='utf-8' standalone="yes"?>  <?xml version='1.0' encoding='utf-8'?>
  • 33. DOCUMENTTYPE DECLARATION  This should not be confused with the DTD (Remember: DocumentType Definition)! Rather, the DocumentType Declaration can refer to an external DTD and/or contain part of the DTD.  Body  This is, of course, the main course of the XML data, which we've discussed at length in terms of its components: elements,  attributes, character data, etc. It is worth reiterating that the body may contain comments, PIs, and/or whitespace characters  interleaved with elements and character data.The elements must comprise a hierarchical tree, with a single root node.
  • 34. EPILOG  The XML epilog is the dessert with potentially unpleasant consequences! It may include comments, PIs, and/or whitespace.Comments and whitespace don't cause any significant problems. However, it is unclear whether PIs in the epilog should be applied to the elements in the preceding XML data, or a subsequent XML document (if any).This may well be a solution in search of a problem, or it may just be a problem in and of itself. XML does not define any end-ofdocument indicator, and many applications will use the document element end-tag for this purpose. In this case, the epilog is never read, let alone processed.  This is a "real design error" as considered byTim Bray (one of the XML 1.0 REC editors). It is probably inadvisable to use it without a very compelling reason - and the prior knowledge that it will likely not be interoperable with other XML applications.
  • 35. VALID XML  Any XML data object is considered valid XML if it is well formed, and it meets certain further validity constraints and matches a grammar describing the document's content. Like SGML, XML can provide such a description of document structure in the form of an XML Schema or a DTD  The SGML equivalent of a well-formed document is known as tag-valid.The SGML equivalent of a valid document is type-valid
  • 36. XML PARSERS  In addition to specifying the syntax of XML, theW3C described some of the behavior of the lower tier of XML's client architecture (the XML processor or parser)  Parser Levels  Two levels of parser ("processor") behavior are defined in the XML 1.0 REC:  Non-validating - ensures that the data is well-formed XML, but need not resolve any external resources  Validating - ensures both well-formedness and validity using a DTD, and must resolve external resources
  • 37.  Parser Implementations  There are two different implementation approaches to processing the XML data:  Event-driven parser - Processes XML data sequentially, handling components one at a time  Tree-based parser - Constructs a tree representation of the entire document and provides access to individual nodes in  the tree (can be constructed on top of an event-driven parser)  Much quasi-religious argument has occurred about this dichotomy, but each approach has its merits. Like so many other realworld  problems, XML processing may have vastly different requirements, and thus different approaches may be best for  different situations.
  • 38. EVENT-DRIVEN PARSERS  The event-driven model should be quite familiar to programmers of modern GUI interfaces and operating systems. In this case, the XML parser executes a call- back to the application for each component of the XML data: element (with attributes), character data, processing instructions, notation, or comments. It's up to the application to handle the XML data as it is provided via the call-backs - the XML parser does not maintain the element tree structure, or any of the data after it has been parsed.The eventdriven method requires very modest system resources, even for extremely large documents; and because of its simple, low- level access to the structure of the XML data, provides great flexibility in handling the data within the XML application.
  • 39. TREE-BASED PARSERS  One of the most widely used structures in software engineering is the simple hierarchical tree. All well-formed XML data is defined to be such a tree, and thus common and mature algorithms may be used to traverse the nodes of an XML document, search for content, and/or edit the document tree.These tree algorithms have the advantage of years of academic and commercial development.  XML parsers that use this approach generally conform to the W3C's Document Object Model (DOM).The DOM is a platform and language-neutral interface that allows manipulation of tree-structured documents. On the other hand, the DOM tree must be built in memory, before the document can be manipulated - high- performance virtual memory support is imperative for larger documents! Once the tree is built, an application may access the DOM via a related API.