UNIT I INTRODUCTION TO XML
XML document structure – Well formed and valid
documents – Namespaces – DTD – XML Schema
– X-Files.
XML Document Structure
Including all sections of an XML document
helps to make for a well-structured XML document
XML Document Structure
• An XML document consists of a number of discrete components
• Not all the sections of an XML document may be necessary,
– But their inclusion helps to make for a well-structured XML document
• A well-structured XML document can
– Easily be transported between systems and devices
Major portions of an XML document
• The major portions of an XML document include the following:
– The XML declaration
– The Document Type Declaration (DTD)
– The element data
– The attribute data
– The character data or XML content
XML Declaration
• XML Declaration is a definite way of stating exactly
– What the document contains.
• XML document can optionally have an XML declaration
– It must be the first statement of the XML document
• XML declaration is a processing instruction of the form
<?xml ...?>
Components of XML Declaration
Component Meaning
<?xml Starts the beginning of the processing instruction
Version= “xxx” Describes the specific version of XML being used
standalone= “xxx” Defines whether documents are allowed to contain
external markup declarations
encoding= “xxx” Indicates the character encoding that the document uses.
The default is “US-ASCII” but can be set to any value
Example :
Document Type Declaration (DOCTYPE)
• DOCTYPE
– Gives a name to the XML content , and
– Provides a means to guarantee the document’s validity,
• Either by including or specifying a link to a Document Type Definition (DTD).
• DOCTYPE is optional in XML
• Valid XML documents must declare the document type to which they
comply
General Form of DOCTYPE
• General Forms of the Document Type Declarations
<!DOCTYPE NAME SYSTEM “file”>
<!DOCTYPE NAME [ ]>
<!DOCTYPE NAME SYSTEM “file” [ ]>
First form refers to
– A document that only allows use of an externally defined DTD subset.
Second declaration
– Only allows an internally defined subset within the document.
Last form provides
– A place for inclusion of an internally defined DTD subset b/w square brackets
while also making use of an external subset.
Example on DOCTYPE
• Example on First Forms
<!DOCTYPE shirt SYSTEM “shirt.dtd”>
– Root (first) tag in the document will be the <shirt> element
– DTD is saved to a file named shirt.dtd
Components of DOCTYPE
Markup and Content
• XML documents are composed of markup and content.
• In general, six kinds of markup can occur in an XML document:
– elements,
– entity references,
– comments,
– processing instructions,
– marked sections, and
– Document Type Declarations.
Elements
• XML elements are
– Either a matched pair of XML tags or single XML tags that are “self-closing.”
• For example,
– A shirt element begins with <shirt> and ends with </shirt>.
• When elements do not come in pairs,
– The element name is suffixed by the forward slash.
• The “unmatched” elements are known as empty elements
• Elements can be arbitrarily nested within other elements
Attributes
• Within elements,
Additional information can be communicated to XML processors
– That modifies the nature of the encapsulated content.
• Attributes are name/value pairs contained within the start element
– That can specify text strings that modify the context of the element.
• Example:
<price currency=”USD”>…</price>
<on_sale start_date=”10-15-2001”/>
Entity References
• Some characters have a special meaning in XML,
• Entity references indicate to XML-processing applications
– That a special text string is to follow that will be replaced with a different literal value,
• Entity references are delimited by
– An ampersand at the beginning and
– A semicolon at the ending.
• Ex : Inserting a > sign in our text
<descript> Following says 8 is greater than 5 </descript>
<equation>4 &gt; 5</equation>
Major Entity References Character
&lt; <
&gt; >
&amp; &
&quot; "
&apos; '
Comments
• Comments can be placed anywhere in a document and
– They are not considered to be part of the textual content of an XML document.
• Character sequence <!-- begins a comment and --> ends the comment.
• B/w these 2 delimiters,
– Any text at all can be written, including valid XML markup.
• Only restriction is that
– Comment delimiters cannot be used; neither can the literal string --.
• Example :
<!-- The below element talks about Elephant I once owned... -->
<animal>Elephant</animal>
Processing Instructions (PIs)
• PIs are not a textual part of an XML document
– But provide information to applications as to how the content should be processed.
• Unlike comments, XML processors are required to pass along PIs.
• Processing instructions have the following form:
<?instruction options?>
• Instruction name is called the PI target
– It is a special identifier that the processing application is intended to understand.
• Any following information can be optionally specified
• Example: <?send-message “process complete”?>
Marked CDATA Sections
• Some documents will contain a large number of characters and text
– That an XML processor should ignore and pass to an application.
• These are known as character data (or CDATA) sections.
• Within an XML document, a CDATA section instructs the parser
– To ignore all markup characters except the end of the CDATA markup instruction.
• This allows for a section of XML code to be “escaped”
– So that it doesn’t inadvertently disrupt XML processing.
• CDATA sections follow this general form:
<![CDATA[content]]>
Marked CDATA Sections
• All content contained in the CDATA section is
– Passed as string literals directly to the application without interpretation
• Example:
<object_code>
<![CDATA[
function master(poltice integer) {
if poltice<=3 then {
Mas=poltice+IntToString(FindElement(“<chicken>”));
}
}
]]>
</object_code>
Document Type Definitions (DTD)
• Don’t confuse the DOCTYPE with the DTD.
• A DOCTYPE and a DTD serve very different, although related purposes.
– DOCTYPE is used to identify and name the XML content
– DTD is used to validate the metadata contained within.
• DTDs represent a specific form of XML text
– That is allowable in an XML document.
• DTDs and XML Schema are the means for defining the validity constraints
on XML documents
XML Content
• XML content can consist of any data, including binary data,
– As long as it doesn’t violate rules that would confuse the content with valid XML
metadata instructions.
• XML content can contain any characters,
– Including any valid Unicode and international characters.
• XML content can be as long as necessary
XML document with an internal DTD
• A DTD defines the structure & the legal elements and attributes of an XML
document.
• An application can use a DTD to verify that XML data is valid.
• If the DTD is declared inside the XML file,
– It must be wrapped inside the <!DOCTYPE> definition.
• Document Type Declaration (DOCTYPE) gives a name to the XML
content
Document Type Declaration (DOCTYPE)
• A DTD defines the structure & the legal elements and attributes of an XML
document.
• An application can use a DTD to verify that XML data is valid.
• If the DTD is declared inside the XML file,
– It must be wrapped inside the <!DOCTYPE> definition.
• Document Type Declaration (DOCTYPE) gives a name to the XML
content
01 xml document structure

01 xml document structure

  • 1.
    UNIT I INTRODUCTIONTO XML XML document structure – Well formed and valid documents – Namespaces – DTD – XML Schema – X-Files.
  • 2.
    XML Document Structure Includingall sections of an XML document helps to make for a well-structured XML document
  • 3.
    XML Document Structure •An XML document consists of a number of discrete components • Not all the sections of an XML document may be necessary, – But their inclusion helps to make for a well-structured XML document • A well-structured XML document can – Easily be transported between systems and devices
  • 4.
    Major portions ofan XML document • The major portions of an XML document include the following: – The XML declaration – The Document Type Declaration (DTD) – The element data – The attribute data – The character data or XML content
  • 5.
    XML Declaration • XMLDeclaration is a definite way of stating exactly – What the document contains. • XML document can optionally have an XML declaration – It must be the first statement of the XML document • XML declaration is a processing instruction of the form <?xml ...?>
  • 6.
    Components of XMLDeclaration Component Meaning <?xml Starts the beginning of the processing instruction Version= “xxx” Describes the specific version of XML being used standalone= “xxx” Defines whether documents are allowed to contain external markup declarations encoding= “xxx” Indicates the character encoding that the document uses. The default is “US-ASCII” but can be set to any value Example :
  • 7.
    Document Type Declaration(DOCTYPE) • DOCTYPE – Gives a name to the XML content , and – Provides a means to guarantee the document’s validity, • Either by including or specifying a link to a Document Type Definition (DTD). • DOCTYPE is optional in XML • Valid XML documents must declare the document type to which they comply
  • 8.
    General Form ofDOCTYPE • General Forms of the Document Type Declarations <!DOCTYPE NAME SYSTEM “file”> <!DOCTYPE NAME [ ]> <!DOCTYPE NAME SYSTEM “file” [ ]> First form refers to – A document that only allows use of an externally defined DTD subset. Second declaration – Only allows an internally defined subset within the document. Last form provides – A place for inclusion of an internally defined DTD subset b/w square brackets while also making use of an external subset.
  • 9.
    Example on DOCTYPE •Example on First Forms <!DOCTYPE shirt SYSTEM “shirt.dtd”> – Root (first) tag in the document will be the <shirt> element – DTD is saved to a file named shirt.dtd
  • 10.
  • 11.
    Markup and Content •XML documents are composed of markup and content. • In general, six kinds of markup can occur in an XML document: – elements, – entity references, – comments, – processing instructions, – marked sections, and – Document Type Declarations.
  • 12.
    Elements • XML elementsare – Either a matched pair of XML tags or single XML tags that are “self-closing.” • For example, – A shirt element begins with <shirt> and ends with </shirt>. • When elements do not come in pairs, – The element name is suffixed by the forward slash. • The “unmatched” elements are known as empty elements • Elements can be arbitrarily nested within other elements
  • 13.
    Attributes • Within elements, Additionalinformation can be communicated to XML processors – That modifies the nature of the encapsulated content. • Attributes are name/value pairs contained within the start element – That can specify text strings that modify the context of the element. • Example: <price currency=”USD”>…</price> <on_sale start_date=”10-15-2001”/>
  • 14.
    Entity References • Somecharacters have a special meaning in XML, • Entity references indicate to XML-processing applications – That a special text string is to follow that will be replaced with a different literal value, • Entity references are delimited by – An ampersand at the beginning and – A semicolon at the ending. • Ex : Inserting a > sign in our text <descript> Following says 8 is greater than 5 </descript> <equation>4 &gt; 5</equation> Major Entity References Character &lt; < &gt; > &amp; & &quot; " &apos; '
  • 15.
    Comments • Comments canbe placed anywhere in a document and – They are not considered to be part of the textual content of an XML document. • Character sequence <!-- begins a comment and --> ends the comment. • B/w these 2 delimiters, – Any text at all can be written, including valid XML markup. • Only restriction is that – Comment delimiters cannot be used; neither can the literal string --. • Example : <!-- The below element talks about Elephant I once owned... --> <animal>Elephant</animal>
  • 16.
    Processing Instructions (PIs) •PIs are not a textual part of an XML document – But provide information to applications as to how the content should be processed. • Unlike comments, XML processors are required to pass along PIs. • Processing instructions have the following form: <?instruction options?> • Instruction name is called the PI target – It is a special identifier that the processing application is intended to understand. • Any following information can be optionally specified • Example: <?send-message “process complete”?>
  • 17.
    Marked CDATA Sections •Some documents will contain a large number of characters and text – That an XML processor should ignore and pass to an application. • These are known as character data (or CDATA) sections. • Within an XML document, a CDATA section instructs the parser – To ignore all markup characters except the end of the CDATA markup instruction. • This allows for a section of XML code to be “escaped” – So that it doesn’t inadvertently disrupt XML processing. • CDATA sections follow this general form: <![CDATA[content]]>
  • 18.
    Marked CDATA Sections •All content contained in the CDATA section is – Passed as string literals directly to the application without interpretation • Example: <object_code> <![CDATA[ function master(poltice integer) { if poltice<=3 then { Mas=poltice+IntToString(FindElement(“<chicken>”)); } } ]]> </object_code>
  • 19.
    Document Type Definitions(DTD) • Don’t confuse the DOCTYPE with the DTD. • A DOCTYPE and a DTD serve very different, although related purposes. – DOCTYPE is used to identify and name the XML content – DTD is used to validate the metadata contained within. • DTDs represent a specific form of XML text – That is allowable in an XML document. • DTDs and XML Schema are the means for defining the validity constraints on XML documents
  • 20.
    XML Content • XMLcontent can consist of any data, including binary data, – As long as it doesn’t violate rules that would confuse the content with valid XML metadata instructions. • XML content can contain any characters, – Including any valid Unicode and international characters. • XML content can be as long as necessary
  • 22.
    XML document withan internal DTD • A DTD defines the structure & the legal elements and attributes of an XML document. • An application can use a DTD to verify that XML data is valid. • If the DTD is declared inside the XML file, – It must be wrapped inside the <!DOCTYPE> definition. • Document Type Declaration (DOCTYPE) gives a name to the XML content
  • 23.
    Document Type Declaration(DOCTYPE) • A DTD defines the structure & the legal elements and attributes of an XML document. • An application can use a DTD to verify that XML data is valid. • If the DTD is declared inside the XML file, – It must be wrapped inside the <!DOCTYPE> definition. • Document Type Declaration (DOCTYPE) gives a name to the XML content