XML, DTD & SCHEMA 
Pradeep Rapolu
MODULE 1: XML OVERVIEW
Agenda 
Introduction to XML 
XML Tree 
XML Syntax Rules 
XML Elements 
XML Attributes 
XML Namespaces 
XML Encoding 
XML with CSS
Introduction to XML 
What is XML? 
• XML is a markup language much like HTML 
• XML was designed to describe data. 
• XML tags are not predefined. 
• XML is a W3C Recommendation 
XML is not a replacement of HTML 
• XML specifies what data is. 
• HTML specifies how data looks. 
 XML Doesn’t do anything. 
 Some code makes use of XML.
Advantages of XML: 
• XML Separates Data from HTML 
• XML Simplifies Data Sharing 
• XML Simplifies Data Transport 
• XML Simplifies Platform Changes 
• Several Internet languages are written in XML. 
 XHTML 
 XML Schema 
 SVG 
 WSDL 
 RSS
XML Tree 
• XML documents form a tree structure 
• XML documents are made up with 
 Elements 
 Attributes 
 Text
XML Syntax Rules 
• XML Elements Must Have a Closing Tag 
• XML Tags are Case Sensitive 
• XML Elements Must be Properly Nested 
• XML Documents Must Have a Root Element 
• Entity References 
• Comments in XML 
• XML must be well formed 
Valid XML: 
<color id=“2”>green</color> <!-- The color is green --> 
Invalid XML: 
<color id=2>green</Color
XML Elements 
• XML Element is everything from a start tag to end tag. 
• An element can contain 
 other elements 
 text 
 attributes 
 or a mix of all of the above. 
• XML Elements must follow naming rules. 
E.g.: 
<country type=“subcontinent”>India</country> 
XML Attributes 
• Attributes provide additional information about an element. 
• XML Attribute Values Must be Quoted 
• Avoid attributes – use only to store metadata. 
E.g.: 
<file type="gif">computer.gif</file>
XML Namespaces 
• Namespaces – to avoid name conflicts 
Syntax: 
xmlns:prefix="URI“ 
Default Namespace: 
• Saves from using prefixes in all the child elements 
Syntax: 
xmlns="namespaceURI“
XML Encoding 
• XML documents can contain international characters 
Syntax: 
<?xml version="1.0" encoding="UTF-8"?> 
Unicode: 
• Unicode is an industry standard for character encoding of text documents 
• Unicode has two variants: 
 UTF-8 
 UTF-16. 
• UTF = Universal character set Transformation Format. 
• UTF-8 uses 1 byte (8-bits) to represent characters in the ASCII set, and two or 
three bytes for the rest. 
• UTF-16 uses 2 bytes (16 bits) for most characters, and four bytes for the rest. 
• UTF-8 is the default for documents without encoding information.
XML with CSS 
• XML documents can be formatted with CSS (Cascading Style Sheets) 
• Formatting XML with CSS is not the most common method. 
• W3C recommends using XSLT instead.
Module 2: DTD Overview
Agenda 
 Introduction to DTD 
 DTD Building Blocks 
 DTD Elements 
 DTD Attributes 
 DTD Entities
Introduction to DTD 
• DTD defines the document structure with a list of legal elements and 
attributes. 
• The XML document that follows DTD is valid and well formed. 
Why DTD? 
• With a DTD, each XML file can carry a description of its own format. 
• To verify if the XML received from outside world is valid 
• To maintain a standard for interchanging data 
DTD Declaration Types: 
1. Internal DTD Declaration 
2. External DTD Declaration
1. Internal DTD Declaration: 
• The DTD is declared inside the XML file 
Syntax: 
<!DOCTYPE root-element [element-declarations]> 
2. External DTD Declaration 
• The DTD is declared in an external file 
• The DTD document is referred to xml document 
Syntax: 
<!DOCTYPE root-element SYSTEM "filename">
DTD Building Blocks 
• Per DTD all the XML documents are made up by the following building 
blocks 
 Elements 
 Attribues 
 Entities 
 PCDATA 
 CDATA
DTD Elements 
• In DTD, elements are declared with an ELEMENT declaration. 
Syntax: 
<!ELEMENT element-name category> 
or 
<!ELEMENT element-name (element-content)> 
Element Types: 
• <!ELEMENT element-name EMPTY> 
• <!ELEMENT element-name (#PCDATA)> 
• <!ELEMENT element-name ANY> 
• <!ELEMENT element-name (child1, child2,…..)> 
• <!ELEMENT element-name (child-name)> 
• <!ELEMENT element-name (child-name+)> 
• <!ELEMENT element-name (child-name*)> 
• <!ELEMENT element-name (child-name?)> 
• <!ELEMENT element-name (child1, child2, (child3|child4))> 
• <!ELEMENT element-name (#PCDATA|child1|child2|child3|child4)*>
DTD Attributes 
• DTD, attributes are declared with an ATTLIST declaration. 
Syntax: 
<!ATTLIST element-name attribute-name attribute-type attribute-value> 
Attribute Values: 
• <!ATTLIST element-name attribute-name attribute-type default-value> 
• <!ATTLIST element-name attribute-name attribute-type #REQUIRED> 
• <!ATTLIST element-name attribute-name attribute-type #IMPLIED> 
• <!ATTLIST element-name attribute-name attribute-type #FIXED "value"> 
• <!ATTLIST element-name attribute-name (en1|en2|..) default-value>
DTD Entities 
• Entities are like variables 
• Entities can be declared internal or external 
1. Internal Entity Declaration: 
Syntax: 
<!ENTITY entity-name "entity-value"> 
2. External Entity Declaration: 
Syntax: 
<!ENTITY entity-name SYSTEM "URI/URL"> 
Entity reference in XML document: 
<element-name>&entity-name;</element-name>
Module 3: XML Schema Overview
XML Schema 
• XML schema describes the structure of an XML document. 
• XSD - XML Schema language 
What is an XML Schema? 
• XML Schema defines the legal building blocks of an XML document. 
An XML Schema - 
 defines elements that can appear in a document 
 defines attributes that can appear in a document 
 defines which elements are child elements 
 defines the order of child elements 
 defines the number of child elements 
 defines whether an element is empty or can include text 
 defines data types for elements and attributes 
 defines default and fixed values for elements and attributes
Advantages of XML Schema over DTD 
• XML Schemas are written in XML 
• XML Schemas support data types 
• XML Schemas support namespaces 
XML Schema Syntax: 
• The XML Schema must be embedded inside the root element <schema> 
<?xml version="1.0"?> 
<xs:schema> 
... 
... 
</xs:schema> 
XML With XSD: 
• XML documents refer XML Schema. (XSD Documents)
Agenda 
 XML Schema 
 XSD Simple Types 
 XSD Complex Types 
 XSD Complex Types – Indicators 
 XSD Complex Types - any & anyAttribute 
 XSD Complex Types - Element Substitution 
 Writing XML Schema 
 XSD Data types
XSD Simple Types 
• The Simple Types in XSD are – 
 Simple Element 
 Attribute 
1. Simple Element: 
• Element contains only text, but no other elements or attributes. 
Syntax: 
<xs:element name=“element-name" type=“element-type"/> 
• Simple elements can have default and fixed values 
• XML Schema has a lot of built-in data types. The most common types are: 
 xs:string 
 xs:decimal 
 xs:integer 
 xs:boolean 
 xs:date 
 xs:time
2. Attribute: 
• Simple elements cannot have attributes. 
• The attribute itself is a simple type. 
Syntax: 
<xs:attribute name=“attribute-name" type=“attribute-type"/> 
E.g.: 
<lastname lang="EN">Smith</lastname> <!--Element with Attribute --> 
<xs:attribute name="lang" type="xs:string"/> <!-- Attribute definition --> 
XSD Restrictions/Facets: 
• Restrictions define acceptable values for XML elements or attributes. 
• Restrictions on XML elements are called facets. 
Different Restrictions: 
• Restrictions on Values 
• Restrictions on set of values 
• Restrictions on a Series of Values 
• Restrictions on Whitespace Characters 
• Restrictions on Length
XSD Complex Types 
• A complex type element contains other elements and/or attributes. 
• There are four kinds of complex elements - 
 empty elements 
 elements that contain only other elements 
 elements that contain only text 
 elements that contain other elements, attributes and text 
** The Complex Type Elements can be Extended or Restricted 
 Empty elements: 
• An empty complex element cannot have contents, but only attributes. 
E.g.: <product prodid="1345" /> 
** By giving complexType element a name and let the element have a type 
attribute that refers to the name of the complexType several elements can 
refer to the same complex type
 Elements that contain only other elements: 
• An "elements-only" complex type contains an element that contains only 
other elements. 
E.g.: <person> 
<firstname>John</firstname> 
<lastname>Smith</lastname> 
</person> 
 Elements that contain only text: 
• A complex text-only element can contain text and attributes. 
E.g.: <shoesize country="france">35</shoesize> 
• This type contains only simple content (text and attributes) 
• We add a simpleContent element around the content.
 Elements that contain other elements, attributes and text (Mixed): 
• A mixed complex type element can contain attributes, elements, and text. 
E.g.: <letter id=“123”> 
Dear Mr.<name>John Smith</name>. 
Your order <orderid>1032</orderid> 
will be shipped on <shipdate>2001-07-13</shipdate>. 
</letter>
XSD Complex Types - Indicators 
• We can control HOW elements are to be used in documents with indicators. 
• There are seven indicators classified into 3 types 
a) Order indicators: 
• Order indicators define the order of the elements. 
 All: The child elements can appear in any order, but must occur 
only once: 
 Choice: Either one child element or another can occur, but not both 
 Sequence: The child elements must appear in a specific order 
b) Occurrence indicators: 
• Occurrence indicators define the no. of times an element can appear 
 maxOccurs: Maximum number of times an element can occur 
 minOccurs: Minimum number of times an element can occur
c) Group indicators: 
• Group indicators define related sets of elements. 
 Element Groups: 
• Define related sets of elements 
• Element groups are defined with the group declaration. 
Syntax: 
<xs:group name="groupname"> 
... 
</xs:group> 
 Attribute Groups: 
• Define related sets of attributes. 
• Attribute groups are defined with the attributeGroup declaration 
Syntax: 
<xs:attributeGroup name="groupname"> 
... 
</xs:attributeGroup>
XSD Complex Types - any & anyAttribute 
any Element: 
• The <any> element enables us to extend the XML document with elements not 
specified by the schema! 
anyAttribute Element: 
• The <anyAttribute> element enables us to extend the XML document with 
attributes not specified by the schema!
XSD Complex Types - Element Substitution 
• With Element Substitution one element can substitute another in different 
instances 
• An attribute “substitutionGroup” used to apply substitution. 
• Substitution can be blocked by using attribute block="substitution"
Writing XML Schema 
• Schemas for XML can be created in below ways 
 Hierarchical manner 
 Divide the Schema 
 Using Named Types
XSD Data types 
• XSD has below mentioned data types 
 String 
 Date 
 Numeric 
 Miscellaneous 
Boolean 
Binary 
AnyURI 
Reference: 
http://www.w3schools.com
XML, DTD & XSD Overview

XML, DTD & XSD Overview

  • 1.
    XML, DTD &SCHEMA Pradeep Rapolu
  • 2.
    MODULE 1: XMLOVERVIEW
  • 3.
    Agenda Introduction toXML XML Tree XML Syntax Rules XML Elements XML Attributes XML Namespaces XML Encoding XML with CSS
  • 4.
    Introduction to XML What is XML? • XML is a markup language much like HTML • XML was designed to describe data. • XML tags are not predefined. • XML is a W3C Recommendation XML is not a replacement of HTML • XML specifies what data is. • HTML specifies how data looks.  XML Doesn’t do anything.  Some code makes use of XML.
  • 5.
    Advantages of XML: • XML Separates Data from HTML • XML Simplifies Data Sharing • XML Simplifies Data Transport • XML Simplifies Platform Changes • Several Internet languages are written in XML.  XHTML  XML Schema  SVG  WSDL  RSS
  • 6.
    XML Tree •XML documents form a tree structure • XML documents are made up with  Elements  Attributes  Text
  • 7.
    XML Syntax Rules • XML Elements Must Have a Closing Tag • XML Tags are Case Sensitive • XML Elements Must be Properly Nested • XML Documents Must Have a Root Element • Entity References • Comments in XML • XML must be well formed Valid XML: <color id=“2”>green</color> <!-- The color is green --> Invalid XML: <color id=2>green</Color
  • 8.
    XML Elements •XML Element is everything from a start tag to end tag. • An element can contain  other elements  text  attributes  or a mix of all of the above. • XML Elements must follow naming rules. E.g.: <country type=“subcontinent”>India</country> XML Attributes • Attributes provide additional information about an element. • XML Attribute Values Must be Quoted • Avoid attributes – use only to store metadata. E.g.: <file type="gif">computer.gif</file>
  • 9.
    XML Namespaces •Namespaces – to avoid name conflicts Syntax: xmlns:prefix="URI“ Default Namespace: • Saves from using prefixes in all the child elements Syntax: xmlns="namespaceURI“
  • 10.
    XML Encoding •XML documents can contain international characters Syntax: <?xml version="1.0" encoding="UTF-8"?> Unicode: • Unicode is an industry standard for character encoding of text documents • Unicode has two variants:  UTF-8  UTF-16. • UTF = Universal character set Transformation Format. • UTF-8 uses 1 byte (8-bits) to represent characters in the ASCII set, and two or three bytes for the rest. • UTF-16 uses 2 bytes (16 bits) for most characters, and four bytes for the rest. • UTF-8 is the default for documents without encoding information.
  • 11.
    XML with CSS • XML documents can be formatted with CSS (Cascading Style Sheets) • Formatting XML with CSS is not the most common method. • W3C recommends using XSLT instead.
  • 12.
    Module 2: DTDOverview
  • 13.
    Agenda  Introductionto DTD  DTD Building Blocks  DTD Elements  DTD Attributes  DTD Entities
  • 14.
    Introduction to DTD • DTD defines the document structure with a list of legal elements and attributes. • The XML document that follows DTD is valid and well formed. Why DTD? • With a DTD, each XML file can carry a description of its own format. • To verify if the XML received from outside world is valid • To maintain a standard for interchanging data DTD Declaration Types: 1. Internal DTD Declaration 2. External DTD Declaration
  • 15.
    1. Internal DTDDeclaration: • The DTD is declared inside the XML file Syntax: <!DOCTYPE root-element [element-declarations]> 2. External DTD Declaration • The DTD is declared in an external file • The DTD document is referred to xml document Syntax: <!DOCTYPE root-element SYSTEM "filename">
  • 16.
    DTD Building Blocks • Per DTD all the XML documents are made up by the following building blocks  Elements  Attribues  Entities  PCDATA  CDATA
  • 17.
    DTD Elements •In DTD, elements are declared with an ELEMENT declaration. Syntax: <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)> Element Types: • <!ELEMENT element-name EMPTY> • <!ELEMENT element-name (#PCDATA)> • <!ELEMENT element-name ANY> • <!ELEMENT element-name (child1, child2,…..)> • <!ELEMENT element-name (child-name)> • <!ELEMENT element-name (child-name+)> • <!ELEMENT element-name (child-name*)> • <!ELEMENT element-name (child-name?)> • <!ELEMENT element-name (child1, child2, (child3|child4))> • <!ELEMENT element-name (#PCDATA|child1|child2|child3|child4)*>
  • 18.
    DTD Attributes •DTD, attributes are declared with an ATTLIST declaration. Syntax: <!ATTLIST element-name attribute-name attribute-type attribute-value> Attribute Values: • <!ATTLIST element-name attribute-name attribute-type default-value> • <!ATTLIST element-name attribute-name attribute-type #REQUIRED> • <!ATTLIST element-name attribute-name attribute-type #IMPLIED> • <!ATTLIST element-name attribute-name attribute-type #FIXED "value"> • <!ATTLIST element-name attribute-name (en1|en2|..) default-value>
  • 19.
    DTD Entities •Entities are like variables • Entities can be declared internal or external 1. Internal Entity Declaration: Syntax: <!ENTITY entity-name "entity-value"> 2. External Entity Declaration: Syntax: <!ENTITY entity-name SYSTEM "URI/URL"> Entity reference in XML document: <element-name>&entity-name;</element-name>
  • 20.
    Module 3: XMLSchema Overview
  • 21.
    XML Schema •XML schema describes the structure of an XML document. • XSD - XML Schema language What is an XML Schema? • XML Schema defines the legal building blocks of an XML document. An XML Schema -  defines elements that can appear in a document  defines attributes that can appear in a document  defines which elements are child elements  defines the order of child elements  defines the number of child elements  defines whether an element is empty or can include text  defines data types for elements and attributes  defines default and fixed values for elements and attributes
  • 22.
    Advantages of XMLSchema over DTD • XML Schemas are written in XML • XML Schemas support data types • XML Schemas support namespaces XML Schema Syntax: • The XML Schema must be embedded inside the root element <schema> <?xml version="1.0"?> <xs:schema> ... ... </xs:schema> XML With XSD: • XML documents refer XML Schema. (XSD Documents)
  • 23.
    Agenda  XMLSchema  XSD Simple Types  XSD Complex Types  XSD Complex Types – Indicators  XSD Complex Types - any & anyAttribute  XSD Complex Types - Element Substitution  Writing XML Schema  XSD Data types
  • 24.
    XSD Simple Types • The Simple Types in XSD are –  Simple Element  Attribute 1. Simple Element: • Element contains only text, but no other elements or attributes. Syntax: <xs:element name=“element-name" type=“element-type"/> • Simple elements can have default and fixed values • XML Schema has a lot of built-in data types. The most common types are:  xs:string  xs:decimal  xs:integer  xs:boolean  xs:date  xs:time
  • 25.
    2. Attribute: •Simple elements cannot have attributes. • The attribute itself is a simple type. Syntax: <xs:attribute name=“attribute-name" type=“attribute-type"/> E.g.: <lastname lang="EN">Smith</lastname> <!--Element with Attribute --> <xs:attribute name="lang" type="xs:string"/> <!-- Attribute definition --> XSD Restrictions/Facets: • Restrictions define acceptable values for XML elements or attributes. • Restrictions on XML elements are called facets. Different Restrictions: • Restrictions on Values • Restrictions on set of values • Restrictions on a Series of Values • Restrictions on Whitespace Characters • Restrictions on Length
  • 26.
    XSD Complex Types • A complex type element contains other elements and/or attributes. • There are four kinds of complex elements -  empty elements  elements that contain only other elements  elements that contain only text  elements that contain other elements, attributes and text ** The Complex Type Elements can be Extended or Restricted  Empty elements: • An empty complex element cannot have contents, but only attributes. E.g.: <product prodid="1345" /> ** By giving complexType element a name and let the element have a type attribute that refers to the name of the complexType several elements can refer to the same complex type
  • 27.
     Elements thatcontain only other elements: • An "elements-only" complex type contains an element that contains only other elements. E.g.: <person> <firstname>John</firstname> <lastname>Smith</lastname> </person>  Elements that contain only text: • A complex text-only element can contain text and attributes. E.g.: <shoesize country="france">35</shoesize> • This type contains only simple content (text and attributes) • We add a simpleContent element around the content.
  • 28.
     Elements thatcontain other elements, attributes and text (Mixed): • A mixed complex type element can contain attributes, elements, and text. E.g.: <letter id=“123”> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. </letter>
  • 29.
    XSD Complex Types- Indicators • We can control HOW elements are to be used in documents with indicators. • There are seven indicators classified into 3 types a) Order indicators: • Order indicators define the order of the elements.  All: The child elements can appear in any order, but must occur only once:  Choice: Either one child element or another can occur, but not both  Sequence: The child elements must appear in a specific order b) Occurrence indicators: • Occurrence indicators define the no. of times an element can appear  maxOccurs: Maximum number of times an element can occur  minOccurs: Minimum number of times an element can occur
  • 30.
    c) Group indicators: • Group indicators define related sets of elements.  Element Groups: • Define related sets of elements • Element groups are defined with the group declaration. Syntax: <xs:group name="groupname"> ... </xs:group>  Attribute Groups: • Define related sets of attributes. • Attribute groups are defined with the attributeGroup declaration Syntax: <xs:attributeGroup name="groupname"> ... </xs:attributeGroup>
  • 31.
    XSD Complex Types- any & anyAttribute any Element: • The <any> element enables us to extend the XML document with elements not specified by the schema! anyAttribute Element: • The <anyAttribute> element enables us to extend the XML document with attributes not specified by the schema!
  • 32.
    XSD Complex Types- Element Substitution • With Element Substitution one element can substitute another in different instances • An attribute “substitutionGroup” used to apply substitution. • Substitution can be blocked by using attribute block="substitution"
  • 33.
    Writing XML Schema • Schemas for XML can be created in below ways  Hierarchical manner  Divide the Schema  Using Named Types
  • 34.
    XSD Data types • XSD has below mentioned data types  String  Date  Numeric  Miscellaneous Boolean Binary AnyURI Reference: http://www.w3schools.com