XML
What is XML?
 A markup language like HTML
 Stands for Xstensible Markup Language
 Designed to store data, not display
 Tags not predefined – one must define own tags and
document structure
 Uses DTD/Schema to describe data
 XML document, as well as DTD or XML Schema are designed to
be self-descriptive
 XML is a W3C Recommendation
Difference with HTML?
 XML designed to structure, store and transport data
 Focus on what data is
 HTML designed to display data – focus on how data looks
i.e. HTML is about displaying information, while XML is about
describing info
 HTML uses predefined tags <h1>, <p> <div>, while XML the
author must define both the tags and the document
structure
 XML is just information wrapped in tags, not designed to DO
anything
XML Example
<note>
<to>Amit</to>
<from>Bobby</from>
<heading>Reminder</heading>
<body>Meet me during this weekend!</body>
</note>
 XML is quite self-descriptive, but does not DO anything !
 Need a separate piece of software to send, receive, store, or display it
 Tags <note>, <to>, <from> etc. “invented” by author of document, not
defined by any XML standard
XML is Free & Extensible
 Most XML applications will work as expected even if new data is
added (or removed)
 For example, a newer version of note.xml with added <date> and
<hour> elements, and a removed <heading>
<note>
<date>2015-09-01</date>
<hour>08:30</hour>
<to>Amit</to>
<from>Bobby</from>
<body>Meet me during this weekend!</body>
</note>
 Older versions of applications will still work using new note.xml
Seggregate data from presentation
 With HTML, actual data is stored inside HTML along with
presentation / styling elements
 With XML, data is stored in separate files, without any
information on how to display it
 Same data can be displayed differently under different
scenarios
 HTML can be used with XML to display data
 One should not have to edit the HTML file when the XML data
changes
 With a few lines of JavaScript code, one can read an XML file and
update the data content of any HTML page.
 XML can also be stored inside HTML as data islands
Uses of XML
 Simplifies exchanging data among incompatible systems
 Stores data in plain text format - a software & hardware
independent way of storing, transporting, and sharing data
 Makes platform changes easier
 One can upgrade to new operating systems, new applications, or
new browsers, without losing data
 Simplifies data availability
 With XML, data can be available to all kinds of "reading machines"
like people, computers, voice machines, news feeds etc.
Converting data to XML can reduce complexity of exchanging
data between incompatible systems , and create data that can
be read by many applications
Uses of XML
 Can also be used to store data in database files
 Generic applications can be written to store & retrieve data from
data stores and display the same
 Other applications can access XML files as data sources, as if they
were accessing databases
 Can be used to create new languages like WAP / WML
 Wireless Markup Languages used to mark up internet applications
for handheld devices
.
XML Syntax easy to learn and use
Components of XML
 XML is an ASCII text file, with .xml extension
 Main components
 Elements
 Attributes
 Content
 Comment
Elements
 Basic building block of XML document
 Each element represents a piece of data identified by tag(s)
 Most tags in pair, a start tag at the beginning and an end tag placed
at the end of data
 One can have a hierarchical structure by nesting elements
 Elements can contain text, attributes, other elements, mix
 Elements that contain data embedded with start and end tags –
container elements, while information represents by elements
called content
<Univ> Jadavpur </Univ>  Jadavpur content, <Univ> container
 Empty elements do not contain data/content – do not come in pairs
<tagname/> instead of <tagname></tagname>  self-closing tags
Example, <br/>
Attributes
 Provide additional information about elements
 Each attribute has a name and value
 Value could be number, string, URL
 Attribute values must always be enclosed in quotation marks
 Either single or double quotes maybe used
<Univ location=“kolkata” > Jadavpur </Univ>
location attribute has the value “ kolkata”
 Generally metadata (data about data) should be stored as
attributes, and the data itself should be stored as elements.
Attributes
• If attribute value itself contains double quotes, single quotes used
<person name='George "Shotgun" Ziegler'>
• Or, character entities may be used
<person name="George &quot;Shotgun&quot; Ziegler">
• Some things to consider while defining attributes
• attributes cannot contain multiple values (elements can)
• attributes cannot contain tree structures (elements can)
• attributes are not easily expandable (for future changes)
<note day="10" month="01" year="2008"
to="Tove" from="Jani" heading="Reminder"
body=“Meet me during this weekend!"> </note>  incorrect !
Comments
• To be ignored by XML processors, used to add useful notes
• Syntax for writing comments in XML is similar to that of HTML
<!-- This is a comment -->
• Two dashes in the middle of a comment are not allowed
<!-- This is a -- comment --> not allowed
<!-- This is a - - comment --> allowed
Entity References
• Some characters have a special meaning in XML
• For example, a “<“ inside an element will generate an error
• Because, the parser interprets it as the start of a new element
<message> salary < 1000 </note>  incorrect !
• “<“ replaced with an entity reference “&lt;”
<message> salary &lt; 1000 </note>
Entity
Reference
Description
&lt; <, greater than
&gt; >, less than
&amp; &, ampersand
&apos; ‘ apostrophe
&quot; “, quotation mark
Entity References
 There are 5 pre-defined entity references in XML:
Element vs. Attributes
• Same information may be represented as element or attribute
Date as attribute
<note date="2008-01-10">
<to>Amit</to>
<from>Bobby</from>
</note>
Date as element
<note>
<date>2008-01-10</date>
<to>Amit</to>
<from>Bobby</from>
</note>
Expanded date element
<note>
<date>
<year>2008</year>
<month>01</month>
<day>10</day>
</date>
<to>Amit</to>
<from>Bobby</from>
XML Tree
• XML documents are formed as trees of elements
• An XML tree starts at a root element and branches from the
root to child elements
• All elements can have sub elements (child elements)
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
• Siblings are children on the same level i.e. same parent node
• All elements can have text content and attributes
XML Syntax
 XML documents must have a root element, which is the parent of
all other nodes
 Optional prolog at the beginning
 All XML Elements must have a closing tag
 XML tags are case sensitive - Opening and closing tags must be
written with the same case
<Message>This is incorrect</message>
 XML Elements must be properly nested
<b><i>This text is bold and italic</b></i>  improper nesting
 XML attributes must be in quotes
Structure of Well Formed XML
 Begins with a declaration that it is an XML file
 Optional definition about the type of XML data and what DTD it
follows (prolog)
<?xml version="1.0" encoding="UTF-8" standalone=“Yes”?>
• Attr ‘version’ indicates that document conforms to
standard version 1.0 specifications of XML
• Attr ‘encoding’ specifies character set used as UTF-8
• Attr ‘standalone’ indicates whether browser needs to read
internal (value yes) or external DTD (value no)
 Content marked up using XML tags and comments
 If syntactical rules followed - XML well formed
 If adheres to DTD/Scheme – XML valid
Example of Well Formed XML
 <title>, <author>, <year>, and <price> have text content because they
contain text
 <bookstore> and <book> have element contents, because they
contain elements
 <book> has an attribute (category="children").
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
 4 child elements of parent “book”
“book” child element of
root element “bookstore”
 prolog
 root element
 end of root element
Valid XML Document
 Errors in XML document stop XML applications
 XML documents must be validated prior to using them
 A valid XML document must be well formed and also must conform
to a document type definition.
 Two different document type definitions that can be used with XML:
• DTD - The original Document Type Definition
• XML Schema - An XML-based alternative to DTD
Document Type Definition (DTD)
 Defines structure of the content of an XML – allows storing data
consistently
 Defines the rules and the legal elements and attributes for an XML
document
 Specifies elements that can be present, whether optional, their
attributes and arrangements with respect to each other
 Allows users to create DTDs – gives a complete control over the
process of checking that the structure & contents of XML are OK
 With a DTD, independent groups of people can agree on a standard
DTD for interchanging data.
 An application can use a DTD to verify that XML data is valid
 Elements that can be used in a particular XML be defined using
internal or external DTD
Building Blocks of XML
 From DTD point of view, following are the building blocks:
 Elements
 Attributes
 Comment
 PCDATA
 CDATA
PCDATA & CDATA
 PCDATA - Parsed Character Data
 Character data - text found between the start tag and
the end tag of an XML element
 Will be parsed by a parser – examined whether to be
treated as entities or mark-ups
 Should not contain any &, <, or > characters; these need
to be represented by the &amp; &lt; and &gt; entities
 CDATA – Character Data
 Text that will NOT be parsed by a parser
 Tags inside the text will NOT be treated as markup and
entities will not be expanded
Internal DTD
 DTD included as part of the XML document
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE root_element [
....
....
]>
Internal DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Amit</to>
<from>Bobby</from>
<heading>Reminder</heading>
<body>Meet me during this weekend</body>
</note>
Interpretation of DTD
 !DOCTYPE note defines root element of document as note
 !ELEMENT note defines that the note element must contain four
elements: "to, from, heading, body"
 !ELEMENT to defines the to element to be of type #PCDATA
 !ELEMENT from defines the from element to be of type #PCDATA
 !ELEMENT heading defines the heading element to be of type
#PCDATA
 !ELEMENT body defines the body element to be of type #PCDATA
External DTD
 DTD stored as separate file having the declarations
 Can be applied across multiple XML documents
<!DOCTYPE root_element [PUBLIC/SYSTEM ] “dtd_filename”
"dtd_file_location“ >
PUBLIC – DTD file on public server, file location to be mentioned
SYSTEM – Private DTD identified by the SYSTEM keyword., means
accessible by single or group of users
 <!DOCTYPE> definition within XML file contains a reference to
the DTD file
External DTD
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Amit</to>
<from>Bobby</from>
<heading>Reminder</heading>
<body>Meet me during this weekend</body>
</note>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
XML Naming Rules
 XML elements must follow certain naming rules
• Element names case-sensitive
• Names must start with a letter or underscore
• Names cannot start with the letters xml (or XML, or Xml, etc)
• Names can contain letters, digits, hyphens, underscores, and periods
• Names cannot contain spaces or tabs
Any name can be used, no words are reserved (except xml)
Naming Best Practices
 In a valid XML, only those defined in DTD will be processed
 Hence, as soon as new element is added in XML, must be defined in
DTD also
 XML tags should be easy to remember
 Follow naming rules and best naming practices
 Short and simple names eg. <book_title>, not like <the_title_of_the_book>
 Descriptive names, eg. <person>, <firstname> etc.
 Consistent naming styles eg. All lowercase , Camel case etc.
 Avoid colons, semicolons, dashes
DTD – Elements & Types
 Declared with an ELEMENT declaration using the following structure
<!ELEMENT element-name category>
or
<!ELEMENT element-name (element-content)>
 Empty elements like <br/>
<!ELEMENT element-name EMPTY> eg. <!ELEMENT br EMPTY>
 Elements with parsed character data (container elements):
<!ELEMENT element-name (#PCDATA)> eg. <!ELEMENT from (#PCDATA)>
 Elements declared with the category keyword ANY, can contain any
combination of parsable data declared elsewhere in DTD
(unrestricted elements)
<!ELEMENT element-name ANY> eg. <!ELEMENT note ANY>
DTD – Elements with Children (Sequences)
 Elements with one or more children are declared with the name of
the children elements inside parentheses:
<!ELEMENT element-name (child1)>
or
<!ELEMENT element-name (child1,child2,...)>
Example:
<!ELEMENT note (to, from, heading, body)>
 When children are declared in a sequence separated by commas, the
children must appear in the same sequence in the document
 Individual declaration of child elements must follow
Declaration of Number of Occurrences
 Only one occurrence of an element:
<!ELEMENT element-name (child-name)>
eg. <!ELEMENT note(message)>
Child element "message" must occur once, and only once inside the "note" element
 Minimum one occurrence of an element:
<!ELEMENT element-name (child-name+)>
eg. <!ELEMENT note (message+)>
Element "message" must occur one or more times inside the "note" element
Declaration of Number of Occurrences
 Zero or more occurrences of an element:
<!ELEMENT element-name (child-name*)>
eg. <!ELEMENT note (message*)>
Child element "message" can occur zero or more times inside the "note" element
 Zero or one occurrence of an element:
<!ELEMENT element-name (child-name?)>
eg. <!ELEMENT note (message?)>
Element "message" can occur zero or one time inside the "note" element
Declaration of Either/Or and Mixed Content
 Either/or Content Declaration:
<!ELEMENT note (to,from,header,(message|body))>
"note" element must contain a "to" element, a "from" element, a "header"
element, and either a "message" or a "body" element
 Mixed Content Declaration:
<!ELEMENT note (#PCDATA|to|from|header|message)*>
"note" element can contain zero or more occurrences of parsed character data,
"to", "from", "header", or "message" elements
DTD - Attributes
 Declared with an ATTLIST declaration using the following structure
<!ATTLIST element-name attribute-name attribute-type attribute-value>
Eg. <!ATTLIST payment type CDATA "cheque">
for XML <payment type="cheque" />
Example 2
<!ELEMENT book(title, author)>
<!ELEMENT title(#PCDATA)>
<!ATTLIST title year_published(CDATA) cover(paperback|hardcover) >
<!ELEMENT author(#PCDATA)>
 Attribute type could be:
 CDATA – value is character data
 (en1 | en2 | en3) - XML doc must choose one from an enumerated values
 ID - value is unique id, should start with alphanumeric
 IDREF or IDREFs - id of another element or list of such elements
 Entity or entities - value is an entity or list of entities
DTD - Attributes
 Enumerated attribute values
SYNTAX:
<!ATTLIST element-name attribute-name (en1|en2|..)
default-value>
DTD:
<!ATTLIST payment type (check|cash) "cash">
XML:
<payment type="check" />
or
<payment type="cash" />
(default value of type, if not defined is “cash”)
DTD - Attributes
 Attribute value could be:
#DEFAULT  default value #REQUIRED  must be included
#FIXED  fixed value #IMPLIED  value optional
DTD: <!ELEMENT square EMPTY >
<!ATTLIST square width CDATA “0”> -- default value
Conforming element declaration in XML doc: <square width=“100” />
DTD <!ATTLIST person number CDATA #REQUIRED>
Valid XML:
<person number="5677" />
Invalid XML:
<person />
DTD - Attributes
 Attribute value could be:
#DEFAULT  default value #REQUIRED  must be included
#FIXED  fixed value #IMPLIED  value optional
DTD <!ATTLIST sender company CDATA #FIXED "Microsoft">
Valid XML <sender company="Microsoft" />
Invalid XML <sender company=“IBM" />
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD <!ATTLIST contact phone CDATA #IMPLIED>
Valid XML:
<contact phone="555-667788" /> and <contact />
DTD - Entities
 Define shortcuts (alias) to special characters – Entity References
 Shorter name used during entry, expands to longer name while using
 Makes entering and managing info easier
 Can be declared as internal or external
SYNTAX FOR INTERNAL ENTITY DECLARATION
<!ENTITY entity-name "entity-value">
DTD: <!ENTITY writer "Donald Duck.">
<!ENTITY copyright "Copyright W3Schools.">
XML: <author>&writer;&copyright;</author>
Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).
DTD - Entities
SYNTAX FOR EXTERNAL ENTITY DECLARATION
<!ENTITY entity-name SYSTEM "URI/URL">
DTD:
<!ENTITY writer SYSTEM
"https://www.w3schools.com/entities.dtd">
<!ENTITY copyright SYSTEM
"https://www.w3schools.com/entities.dtd">
XML:
<author>&writer;&copyright;</author>
When to use DTD/Schema
 With a DTD, independent groups of people can agree to use a
standard DTD for interchanging data.
 With a DTD, one can verify own data, as well as received from the
outside world
When not to use DTD/Schema
 When experimenting with XML, or working with small XML files,
creating DTDs may be a waste of time
 If developing applications, wait until the specification is stable
before you add a document definition.
Design of DTD
<?xml version=”1.0” encoding=”utf-8”?>
<BOOK_CATALOG>
<BOOK AccessionNo = “1234”>
<TITLE> BOOK1 </TITLE>
<AUTHOR Category=”Primary”>
<NAME >ABC</NAME>
<ADDRESS>Chennai <PIN>600020</PIN></ADDRESS>
<AUTHOR>
<AUTHOR Category =”Secondary”>
<NAME>ABCD</NAME>
<ADDRESS>Chennai<PIN>600022</PIN></ADDRESS>
<AUTHOR>
<YEARPUBLISHED>1989</YEARPUBLISHED>
</BOOK>
<BOOK AccessionNo = “1235”>
…..
</BOOK>
</BOOK_CATALOG>
Design of DTD
<! DOCTYPE BOOK_CATALOG [
<ELEMENT BOOK_ CATALOG (BOOK+)>
<ELEMENT BOOK(TITLE, AUTHOR+, YEARPUBLISHED)>
<!ATTLIST BOOK AccessionNo CDATA #REQUIRED>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT AUTHOR (NAME, ADDRESS)>
<!ATTLIST AUTHOR Category CDATA #REQUIRED>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT ADDRESS (#PCDATA, PIN)>
<!ELEMENT PIN (#PCDATA)>
<!ELEMENT YEARPUBLISHED(#PCDATA)> ] >

Xml 1

  • 1.
  • 2.
    What is XML? A markup language like HTML  Stands for Xstensible Markup Language  Designed to store data, not display  Tags not predefined – one must define own tags and document structure  Uses DTD/Schema to describe data  XML document, as well as DTD or XML Schema are designed to be self-descriptive  XML is a W3C Recommendation
  • 3.
    Difference with HTML? XML designed to structure, store and transport data  Focus on what data is  HTML designed to display data – focus on how data looks i.e. HTML is about displaying information, while XML is about describing info  HTML uses predefined tags <h1>, <p> <div>, while XML the author must define both the tags and the document structure  XML is just information wrapped in tags, not designed to DO anything
  • 4.
    XML Example <note> <to>Amit</to> <from>Bobby</from> <heading>Reminder</heading> <body>Meet meduring this weekend!</body> </note>  XML is quite self-descriptive, but does not DO anything !  Need a separate piece of software to send, receive, store, or display it  Tags <note>, <to>, <from> etc. “invented” by author of document, not defined by any XML standard
  • 5.
    XML is Free& Extensible  Most XML applications will work as expected even if new data is added (or removed)  For example, a newer version of note.xml with added <date> and <hour> elements, and a removed <heading> <note> <date>2015-09-01</date> <hour>08:30</hour> <to>Amit</to> <from>Bobby</from> <body>Meet me during this weekend!</body> </note>  Older versions of applications will still work using new note.xml
  • 6.
    Seggregate data frompresentation  With HTML, actual data is stored inside HTML along with presentation / styling elements  With XML, data is stored in separate files, without any information on how to display it  Same data can be displayed differently under different scenarios  HTML can be used with XML to display data  One should not have to edit the HTML file when the XML data changes  With a few lines of JavaScript code, one can read an XML file and update the data content of any HTML page.  XML can also be stored inside HTML as data islands
  • 7.
    Uses of XML Simplifies exchanging data among incompatible systems  Stores data in plain text format - a software & hardware independent way of storing, transporting, and sharing data  Makes platform changes easier  One can upgrade to new operating systems, new applications, or new browsers, without losing data  Simplifies data availability  With XML, data can be available to all kinds of "reading machines" like people, computers, voice machines, news feeds etc. Converting data to XML can reduce complexity of exchanging data between incompatible systems , and create data that can be read by many applications
  • 8.
    Uses of XML Can also be used to store data in database files  Generic applications can be written to store & retrieve data from data stores and display the same  Other applications can access XML files as data sources, as if they were accessing databases  Can be used to create new languages like WAP / WML  Wireless Markup Languages used to mark up internet applications for handheld devices . XML Syntax easy to learn and use
  • 9.
    Components of XML XML is an ASCII text file, with .xml extension  Main components  Elements  Attributes  Content  Comment
  • 10.
    Elements  Basic buildingblock of XML document  Each element represents a piece of data identified by tag(s)  Most tags in pair, a start tag at the beginning and an end tag placed at the end of data  One can have a hierarchical structure by nesting elements  Elements can contain text, attributes, other elements, mix  Elements that contain data embedded with start and end tags – container elements, while information represents by elements called content <Univ> Jadavpur </Univ>  Jadavpur content, <Univ> container  Empty elements do not contain data/content – do not come in pairs <tagname/> instead of <tagname></tagname>  self-closing tags Example, <br/>
  • 11.
    Attributes  Provide additionalinformation about elements  Each attribute has a name and value  Value could be number, string, URL  Attribute values must always be enclosed in quotation marks  Either single or double quotes maybe used <Univ location=“kolkata” > Jadavpur </Univ> location attribute has the value “ kolkata”  Generally metadata (data about data) should be stored as attributes, and the data itself should be stored as elements.
  • 12.
    Attributes • If attributevalue itself contains double quotes, single quotes used <person name='George "Shotgun" Ziegler'> • Or, character entities may be used <person name="George &quot;Shotgun&quot; Ziegler"> • Some things to consider while defining attributes • attributes cannot contain multiple values (elements can) • attributes cannot contain tree structures (elements can) • attributes are not easily expandable (for future changes) <note day="10" month="01" year="2008" to="Tove" from="Jani" heading="Reminder" body=“Meet me during this weekend!"> </note>  incorrect !
  • 13.
    Comments • To beignored by XML processors, used to add useful notes • Syntax for writing comments in XML is similar to that of HTML <!-- This is a comment --> • Two dashes in the middle of a comment are not allowed <!-- This is a -- comment --> not allowed <!-- This is a - - comment --> allowed
  • 14.
    Entity References • Somecharacters have a special meaning in XML • For example, a “<“ inside an element will generate an error • Because, the parser interprets it as the start of a new element <message> salary < 1000 </note>  incorrect ! • “<“ replaced with an entity reference “&lt;” <message> salary &lt; 1000 </note>
  • 15.
    Entity Reference Description &lt; <, greaterthan &gt; >, less than &amp; &, ampersand &apos; ‘ apostrophe &quot; “, quotation mark Entity References  There are 5 pre-defined entity references in XML:
  • 16.
    Element vs. Attributes •Same information may be represented as element or attribute Date as attribute <note date="2008-01-10"> <to>Amit</to> <from>Bobby</from> </note> Date as element <note> <date>2008-01-10</date> <to>Amit</to> <from>Bobby</from> </note> Expanded date element <note> <date> <year>2008</year> <month>01</month> <day>10</day> </date> <to>Amit</to> <from>Bobby</from>
  • 17.
    XML Tree • XMLdocuments are formed as trees of elements • An XML tree starts at a root element and branches from the root to child elements • All elements can have sub elements (child elements) <root> <child> <subchild>.....</subchild> </child> </root> • Siblings are children on the same level i.e. same parent node • All elements can have text content and attributes
  • 18.
    XML Syntax  XMLdocuments must have a root element, which is the parent of all other nodes  Optional prolog at the beginning  All XML Elements must have a closing tag  XML tags are case sensitive - Opening and closing tags must be written with the same case <Message>This is incorrect</message>  XML Elements must be properly nested <b><i>This text is bold and italic</b></i>  improper nesting  XML attributes must be in quotes
  • 19.
    Structure of WellFormed XML  Begins with a declaration that it is an XML file  Optional definition about the type of XML data and what DTD it follows (prolog) <?xml version="1.0" encoding="UTF-8" standalone=“Yes”?> • Attr ‘version’ indicates that document conforms to standard version 1.0 specifications of XML • Attr ‘encoding’ specifies character set used as UTF-8 • Attr ‘standalone’ indicates whether browser needs to read internal (value yes) or external DTD (value no)  Content marked up using XML tags and comments  If syntactical rules followed - XML well formed  If adheres to DTD/Scheme – XML valid
  • 20.
    Example of WellFormed XML  <title>, <author>, <year>, and <price> have text content because they contain text  <bookstore> and <book> have element contents, because they contain elements  <book> has an attribute (category="children"). <?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="children"> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>  4 child elements of parent “book” “book” child element of root element “bookstore”  prolog  root element  end of root element
  • 21.
    Valid XML Document Errors in XML document stop XML applications  XML documents must be validated prior to using them  A valid XML document must be well formed and also must conform to a document type definition.  Two different document type definitions that can be used with XML: • DTD - The original Document Type Definition • XML Schema - An XML-based alternative to DTD
  • 22.
    Document Type Definition(DTD)  Defines structure of the content of an XML – allows storing data consistently  Defines the rules and the legal elements and attributes for an XML document  Specifies elements that can be present, whether optional, their attributes and arrangements with respect to each other  Allows users to create DTDs – gives a complete control over the process of checking that the structure & contents of XML are OK  With a DTD, independent groups of people can agree on a standard DTD for interchanging data.  An application can use a DTD to verify that XML data is valid  Elements that can be used in a particular XML be defined using internal or external DTD
  • 23.
    Building Blocks ofXML  From DTD point of view, following are the building blocks:  Elements  Attributes  Comment  PCDATA  CDATA
  • 24.
    PCDATA & CDATA PCDATA - Parsed Character Data  Character data - text found between the start tag and the end tag of an XML element  Will be parsed by a parser – examined whether to be treated as entities or mark-ups  Should not contain any &, <, or > characters; these need to be represented by the &amp; &lt; and &gt; entities  CDATA – Character Data  Text that will NOT be parsed by a parser  Tags inside the text will NOT be treated as markup and entities will not be expanded
  • 25.
    Internal DTD  DTDincluded as part of the XML document <?xml version="1.0" standalone="yes"?> <!DOCTYPE root_element [ .... .... ]>
  • 26.
    Internal DTD <?xml version="1.0"?> <!DOCTYPEnote [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Amit</to> <from>Bobby</from> <heading>Reminder</heading> <body>Meet me during this weekend</body> </note>
  • 27.
    Interpretation of DTD !DOCTYPE note defines root element of document as note  !ELEMENT note defines that the note element must contain four elements: "to, from, heading, body"  !ELEMENT to defines the to element to be of type #PCDATA  !ELEMENT from defines the from element to be of type #PCDATA  !ELEMENT heading defines the heading element to be of type #PCDATA  !ELEMENT body defines the body element to be of type #PCDATA
  • 28.
    External DTD  DTDstored as separate file having the declarations  Can be applied across multiple XML documents <!DOCTYPE root_element [PUBLIC/SYSTEM ] “dtd_filename” "dtd_file_location“ > PUBLIC – DTD file on public server, file location to be mentioned SYSTEM – Private DTD identified by the SYSTEM keyword., means accessible by single or group of users  <!DOCTYPE> definition within XML file contains a reference to the DTD file
  • 29.
    External DTD <?xml version="1.0"?> <!DOCTYPEnote SYSTEM "note.dtd"> <note> <to>Amit</to> <from>Bobby</from> <heading>Reminder</heading> <body>Meet me during this weekend</body> </note> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]>
  • 30.
    XML Naming Rules XML elements must follow certain naming rules • Element names case-sensitive • Names must start with a letter or underscore • Names cannot start with the letters xml (or XML, or Xml, etc) • Names can contain letters, digits, hyphens, underscores, and periods • Names cannot contain spaces or tabs Any name can be used, no words are reserved (except xml)
  • 31.
    Naming Best Practices In a valid XML, only those defined in DTD will be processed  Hence, as soon as new element is added in XML, must be defined in DTD also  XML tags should be easy to remember  Follow naming rules and best naming practices  Short and simple names eg. <book_title>, not like <the_title_of_the_book>  Descriptive names, eg. <person>, <firstname> etc.  Consistent naming styles eg. All lowercase , Camel case etc.  Avoid colons, semicolons, dashes
  • 32.
    DTD – Elements& Types  Declared with an ELEMENT declaration using the following structure <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)>  Empty elements like <br/> <!ELEMENT element-name EMPTY> eg. <!ELEMENT br EMPTY>  Elements with parsed character data (container elements): <!ELEMENT element-name (#PCDATA)> eg. <!ELEMENT from (#PCDATA)>  Elements declared with the category keyword ANY, can contain any combination of parsable data declared elsewhere in DTD (unrestricted elements) <!ELEMENT element-name ANY> eg. <!ELEMENT note ANY>
  • 33.
    DTD – Elementswith Children (Sequences)  Elements with one or more children are declared with the name of the children elements inside parentheses: <!ELEMENT element-name (child1)> or <!ELEMENT element-name (child1,child2,...)> Example: <!ELEMENT note (to, from, heading, body)>  When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document  Individual declaration of child elements must follow
  • 34.
    Declaration of Numberof Occurrences  Only one occurrence of an element: <!ELEMENT element-name (child-name)> eg. <!ELEMENT note(message)> Child element "message" must occur once, and only once inside the "note" element  Minimum one occurrence of an element: <!ELEMENT element-name (child-name+)> eg. <!ELEMENT note (message+)> Element "message" must occur one or more times inside the "note" element
  • 35.
    Declaration of Numberof Occurrences  Zero or more occurrences of an element: <!ELEMENT element-name (child-name*)> eg. <!ELEMENT note (message*)> Child element "message" can occur zero or more times inside the "note" element  Zero or one occurrence of an element: <!ELEMENT element-name (child-name?)> eg. <!ELEMENT note (message?)> Element "message" can occur zero or one time inside the "note" element
  • 36.
    Declaration of Either/Orand Mixed Content  Either/or Content Declaration: <!ELEMENT note (to,from,header,(message|body))> "note" element must contain a "to" element, a "from" element, a "header" element, and either a "message" or a "body" element  Mixed Content Declaration: <!ELEMENT note (#PCDATA|to|from|header|message)*> "note" element can contain zero or more occurrences of parsed character data, "to", "from", "header", or "message" elements
  • 37.
    DTD - Attributes Declared with an ATTLIST declaration using the following structure <!ATTLIST element-name attribute-name attribute-type attribute-value> Eg. <!ATTLIST payment type CDATA "cheque"> for XML <payment type="cheque" /> Example 2 <!ELEMENT book(title, author)> <!ELEMENT title(#PCDATA)> <!ATTLIST title year_published(CDATA) cover(paperback|hardcover) > <!ELEMENT author(#PCDATA)>  Attribute type could be:  CDATA – value is character data  (en1 | en2 | en3) - XML doc must choose one from an enumerated values  ID - value is unique id, should start with alphanumeric  IDREF or IDREFs - id of another element or list of such elements  Entity or entities - value is an entity or list of entities
  • 38.
    DTD - Attributes Enumerated attribute values SYNTAX: <!ATTLIST element-name attribute-name (en1|en2|..) default-value> DTD: <!ATTLIST payment type (check|cash) "cash"> XML: <payment type="check" /> or <payment type="cash" /> (default value of type, if not defined is “cash”)
  • 39.
    DTD - Attributes Attribute value could be: #DEFAULT  default value #REQUIRED  must be included #FIXED  fixed value #IMPLIED  value optional DTD: <!ELEMENT square EMPTY > <!ATTLIST square width CDATA “0”> -- default value Conforming element declaration in XML doc: <square width=“100” /> DTD <!ATTLIST person number CDATA #REQUIRED> Valid XML: <person number="5677" /> Invalid XML: <person />
  • 40.
    DTD - Attributes Attribute value could be: #DEFAULT  default value #REQUIRED  must be included #FIXED  fixed value #IMPLIED  value optional DTD <!ATTLIST sender company CDATA #FIXED "Microsoft"> Valid XML <sender company="Microsoft" /> Invalid XML <sender company=“IBM" /> <!ATTLIST element-name attribute-name attribute-type #IMPLIED> DTD <!ATTLIST contact phone CDATA #IMPLIED> Valid XML: <contact phone="555-667788" /> and <contact />
  • 41.
    DTD - Entities Define shortcuts (alias) to special characters – Entity References  Shorter name used during entry, expands to longer name while using  Makes entering and managing info easier  Can be declared as internal or external SYNTAX FOR INTERNAL ENTITY DECLARATION <!ENTITY entity-name "entity-value"> DTD: <!ENTITY writer "Donald Duck."> <!ENTITY copyright "Copyright W3Schools."> XML: <author>&writer;&copyright;</author> Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).
  • 42.
    DTD - Entities SYNTAXFOR EXTERNAL ENTITY DECLARATION <!ENTITY entity-name SYSTEM "URI/URL"> DTD: <!ENTITY writer SYSTEM "https://www.w3schools.com/entities.dtd"> <!ENTITY copyright SYSTEM "https://www.w3schools.com/entities.dtd"> XML: <author>&writer;&copyright;</author>
  • 43.
    When to useDTD/Schema  With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.  With a DTD, one can verify own data, as well as received from the outside world When not to use DTD/Schema  When experimenting with XML, or working with small XML files, creating DTDs may be a waste of time  If developing applications, wait until the specification is stable before you add a document definition.
  • 44.
    Design of DTD <?xmlversion=”1.0” encoding=”utf-8”?> <BOOK_CATALOG> <BOOK AccessionNo = “1234”> <TITLE> BOOK1 </TITLE> <AUTHOR Category=”Primary”> <NAME >ABC</NAME> <ADDRESS>Chennai <PIN>600020</PIN></ADDRESS> <AUTHOR> <AUTHOR Category =”Secondary”> <NAME>ABCD</NAME> <ADDRESS>Chennai<PIN>600022</PIN></ADDRESS> <AUTHOR> <YEARPUBLISHED>1989</YEARPUBLISHED> </BOOK> <BOOK AccessionNo = “1235”> ….. </BOOK> </BOOK_CATALOG>
  • 45.
    Design of DTD <!DOCTYPE BOOK_CATALOG [ <ELEMENT BOOK_ CATALOG (BOOK+)> <ELEMENT BOOK(TITLE, AUTHOR+, YEARPUBLISHED)> <!ATTLIST BOOK AccessionNo CDATA #REQUIRED> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT AUTHOR (NAME, ADDRESS)> <!ATTLIST AUTHOR Category CDATA #REQUIRED> <!ELEMENT NAME (#PCDATA)> <!ELEMENT ADDRESS (#PCDATA, PIN)> <!ELEMENT PIN (#PCDATA)> <!ELEMENT YEARPUBLISHED(#PCDATA)> ] >

Editor's Notes

  • #8 In real world, computer systems and databases store data in different formats – challenge for developers to exchange data Financials / B2B applications etc.
  • #11 Empty elements can contain attributes
  • #19 XML documents that conform to the syntax rules above are said to be "Well Formed" XML documents. White space is preserved in XML, unlike in HTML
  • #27 !Doctype declaration specifies beginning of dtd
  • #33 ANY – elements declared elsewhere in DTD
  • #44 software might stop working because of validation errors.