Introduction to DTD
Kristian Torp
Department of Computer Science
Aalborg University
people.cs.aau.dk/˜torp
torp@cs.aau.dk
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 1 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 2 / 37
Learning Outcomes
Learning Outcomes
Be able to read and understand a DTD
Be able to construct a DTD for a set of existing XML documents
Be able to validate an XML document against a DTD
Know the limitations of a DTD
Database Focus
All XML technologies are presented from a database perspective!
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 3 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 4 / 37
Example: Course Catalog XML Document
User Requirements
Make a DTD for the course catalog
Use the DTD to validate our course catalog XML document
Example (Current Courses)
<?xml version=” 1.0 ” ?>
<coursecatalog>
<course cid= ’P4 ’>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented programming</ desc>
</ course>
<course cid= ’P2 ’>
<name>DB</name>
<semester>7</ semester>
<desc>Databases including SQL</ desc>
</ course>
</ coursecatalog>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 5 / 37
Example: Course Catalog DTD
Example (DTD for Course Catalog)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!ELEMENT coursecatalog ( course )∗>
<!ELEMENT course (name, semester , desc ) >
<!ELEMENT name (#PCDATA)>
<!ELEMENT semester (#PCDATA)>
<!ELEMENT desc (#PCDATA)>
<! ATTLIST course cid ID #REQUIRED>
Informal Description
A course catalog consists of zero or more of courses
A course consists of a name, a semester, and a description
It is identified by an ID that is required
A (course) name is a string (leaf in XML document)
A semester is a string (leaf in XML document)
A description is a string (leaf in XML document)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 6 / 37
Overview
Purpose
Define the document structure
Legal elements and attributes
Serves the same purpose as a create table statement in SQL
Structure and type of data
Integrity constraints!
Left over from SGML
Is not written in XML
If this is a requirement then use XML Schema
Still very widely used
Because much simpler than XML Schema
Note
Many simple errors can be found using a DTD
A necessity if receiving XML documents from external sources
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 7 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 8 / 37
Simplest Entity
Example (Element Declaration)
<!ELEMENT name (#PCDATA)>
Example (Allowed Values)
<name>Hello Element</name>
<name/>
<name><![CDATA[ select ∗ from emp where sal > 10]]></name>
Example (Illegal Values)
<name>> </name>
<name>&gt;</name>
<name><it>Hello</it></name>
Unknown element <it>, must be defined in DTD
Note
Root, internal-node, and leafs in XML tree representation
Terminal and non-terminal in grammar terminology
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 9 / 37
Sequences of Child Elements
Example (Element Declaration)
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<semester>7</ semester>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
Sequences of Child Elements
Example (Element Declaration)
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<semester>7</ semester>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>
Example (Disallowed XML Fragment, Why?)
<course>
<semester>7</ semester>
<name>OOP</name>
</ course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
Sequences of Child Elements
Example (Element Declaration)
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<semester>7</ semester>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>
Example (Disallowed XML Fragment, Why?)
<course>
<semester>7</ semester>
<name>OOP</name>
</ course>
Example (Is this allowed?)
<course></ course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
Choice Among Child Elements
Example (Element Declaration)
<!ELEMENT circle (x, y, (radius | diameter))>
Example (Allowed XML Fragment)
< c i r c l e>
<x>5</ x>
<y>9</ y>
<diameter>7</ diameter>
</ c i r c l e>
Example (Illegal XML Fragment)
< c i r c l e>
<x>4</ x>
<y>8</ y>
<radius>3.5</ radius>
<diameter>7</ diameter>
</ c i r c l e>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 11 / 37
Symbols in a DTD
Symbols
Symbol Example
∗ <!ELEMENT coursecatalog (course)∗>
+ <!ELEMENT coursecatalog (course)+>
? <!ELEMENT coursecatalog (course)?>
, <!ELEMENT course (name, semester, desc) >
| <!ELEMENT course (name | semester | desc) >
Note
Symbols are mostly taken from regular expressions
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 12 / 37
Mixed Content
Example (Data Centric)
<!ELEMENT coor ( x , y )>
Example (Allowed Fragment)
<coor>
<x>5</ x>
<y>9</ y>
</ coor>
Example (Mixed Content)
<!ELEMENT coor ( x , y , #PCDATA)∗>
Example (Allowed Fragment)
<coor>
This i s the coordinate
(<x>5</ x> , <y>9</ y>) where
the treasure i s hidden !
</ coor>
Note
Data centric very table like
Mixed content also called narrative document
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 13 / 37
Element Declarations using ANY
Example (Any)
<!ELEMENT coor (ANY)>
<!ELEMENT x (#PCDATA)
<!ELEMENT y (#PCDATA)
Example (Allowed Fragments)
<coor/>
<coor>Hello World</coor>
<coor>Hello <x>1</x><x/>World<y>3</y><y>4</y></coor>
<coor>Hello <x>1</x><y>2</y>World<y>3</y><x>4</x></coor>
Example (Illegal Fragments)
<coor><z>1</z></coor>
<coor><x>1</x><y>1<y/><z>1</z></coor>
Note
ANY handy for narrative documents, e.g., HTML
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 14 / 37
Element Declarations using EMPTY
Example (Empty)
<!ELEMENT coor EMPTY>
Example (Allowed?)
<coor></coor>
<coor/>
<coor>Hello</coor>
<coor><x>Hello</x></coor>
<coor> </coor>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 15 / 37
Summary: Elements
Repetition
Symbol Explanation Example
? zero-or-one <!ELEMENT person (address?)>
* zero-or-more <!ELEMENT person (address∗)>
+ one-or-more <!ELEMENT person (address+)>
once <!ELEMENT person (address)>
Sequence or Choice
Symbol Explanation Example
, Sequence <!ELEMENT coor (x, y)>
| Choice <!ELEMENT coor (x | y)>
Data Type
Symbol Explanation Example
#PCDATA String <!ELEMENT name (#PCDATA)>
ANY What ever <!ELEMENT coor (ANY)>
EMPTY Empty <!ELEMENT room EMPTY>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 16 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 17 / 37
Attribute Declarations
Example (Circles)
<?xml version= ’ 1.0 ’ encoding= ’ utf −8 ’?>
<!ELEMENT drawing ( c i r c l e )∗>
<!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )>
<! ATTLIST c i r c l e cid ID #REQUIRED
name CDATA #IMPLIED >
<!ELEMENT x (#PCDATA)>
<!ELEMENT y (#PCDATA)>
<!ELEMENT radius (#PCDATA)>
<! ATTLIST radius u n i t (mm|cm |m) ”m”> <!−− Enum with default −−>
<!ELEMENT diameter (#PCDATA)>
<! ATTLIST diameter u n i t (mm|cm |m) #REQUIRED> <!−− Enum no default −−>
Note
Mandatory and optional attributes
One or more attributes
Enumeration with defaults
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 18 / 37
Example Document
Example (Circles)
<?xml version=” 1.0 ” encoding= ’UTF−8 ’?>
<!DOCTYPE drawing SYSTEM ” c i r c l e a t t . dtd ”>
<drawing>
< c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius> <!−− default u n i t−−>
</ c i r c l e>
< c i r c l e cid= ’C2 ’> <!−− name not required −−>
<x>5</ x> <y>5</ y>
<radius u n i t =”cm”>4</ radius> <!−− e x p l i c i t u n i t−−>
</ c i r c l e>
</ drawing>
Note
Unique value is not an integer
Used that attribute name is optional in element circle
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 19 / 37
Uniqueness, Examples
Example (Circle/Points with IDs)
<?xml version= ’ 1.0 ’ encoding= ’ utf −8 ’?>
<!ELEMENT drawing ( point | c i r c l e )∗>
<!ELEMENT point ( x , y )>
<!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )>
<! ATTLIST c i r c l e did ID #REQUIRED>
<! ATTLIST point did ID #REQUIRED>
Example (Circles)
<drawing>
< c i r c l e did= ’C1 ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= ’P2 ’>
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 20 / 37
Uniqueness, Errors
Example (Find the error 1!)
<drawing>
<c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’>
<x>8</ x> <y>8</ y>
<radius>5</ radius>
</ c i r c l e>
<c i r c l e cid= ’C1 ’>
<x>5</ x> <y>5</ y>
<radius u n i t =”cm”>8</ radius>
</ c i r c l e>
</ drawing>
Example (Find the error 2!)
<drawing>
<c i r c l e did= ’C11 ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= ’C11 ’>
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
Example (Find the error 3!)
<drawing>
<c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’>
<x>8</ x> <y>8</ y>
<radius>5</ radius>
</ c i r c l e>
<c i r c l e cid= ’2C ’>
<x>5</ x> <y>5</ y>
<radius u n i t =”cm”>8</ radius>
</ c i r c l e>
</ drawing>
Example (Find the error 4!)
<drawing>
<c i r c l e did= ’C11 ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= ’ C1111111111111111111111
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 21 / 37
Uniqueness
Limitations
Only attribute values unique not element values
Cannot be a integer, e.g., <circle did=’1’> not allowed
Only unique within a single document
Uniqueness not guaranteed across multiple documents
Only a single attribute uniqueness (no composite keys)
Combination of x and y coordinates cannot be declared unique
Note
Uniqueness quite restrictive compared to DBMS technology
XML Schema lifts most limitations on uniqueness
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 22 / 37
Empty Elements with Attributes
Example (Empty)
<!ELEMENT coor EMPTY>
<! ATTLIST coor cid ID #REQUIRED
x CDATA #REQUIRED
y CDATA #REQUIRED
z CDATA #IMPLIED>
Example (Allowed?)
<coor/>
<coor cid=’c1’ x=’1’ y=’1’ z=’1’/>
<coor cid=’c2’ x=’2’ y=’2’></coor>
<coor cid=’c3’ x=’3’ y=’3’> </coor>
<coor cid=’c4’ z=’4’ y=’4’ x=’4’/>
<coor z=’5’ y=’5’ x=’5’/>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 23 / 37
Is something Wrong?
Example (Case 1)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID
x CDATA #REQUIRED>
Example (Case 2!)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID #IMPLIED
x CDATA #REQUIRED>
Example (Case 3!)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
x CDATA #REQUIRED
cid ID #REQUIRED>
Example (Case 4)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID ’ 42 ’
x CDATA #REQUIRED>
Example (Case 5)
<!ELEMENT coor (EMPTY)>
<! ATTLIST coor
cid ID #REQUIRED
x CDATA #REQUIRED>
Example (Case 6)
<!ELEMENT coor EMPTY>
<! ATTLIST coor
cid ID #REQUIRED
x ID #REQUIRED>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 24 / 37
Summary: Attributes
General Syntax
<!ATTLIST element−name attribute−name type [DefaultValue]>
Often used types
Type Example
CDATA <!ATTLIST course id CDATA>
ID <!ATTLIST course id ID #REQUIRED>
Enumeration <!ATTLIST course id (OOP | DB)>
Defaults
Type Example
#REQUIRED <!ATTLIST course id ID #REQUIRED>
#IMPLIED <!ATTLIST course id CDATA #IMPLIED>
#FIXED <!ATTLIST course id CDATA #FIXED ”1”>
A value <!ATTLIST course id (OOP | DB) ”DB”>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 25 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 26 / 37
A Buggy DTD
Example (DTD With Five Errors)
<?xml version= ’ 1.0 ’>
<!ELEMENT users user+>
<!ELEMENT user ( firstname , lastname>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname>
Two-Minutes Exercise
With your neighbor identify the errors in the DTD
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
A Buggy DTD
Example (DTD With Five Errors)
<?xml version= ’ 1.0 ’>
<!ELEMENT users user+>
<!ELEMENT user ( firstname , lastname>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname>
Two-Minutes Exercise
With your neighbor identify the errors in the DTD
Example (The Corrected DTD)
<?xml version= ’ 1.0 ’ encoding= ’ utf −8 ’?>
<!ELEMENT users ( user )+>
<!ELEMENT user ( firstname , lastname )>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 28 / 37
Uncertain About Content
Example (DTD for Courses with Flexible Description)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!ELEMENT courses ( course )∗>
<!ELEMENT course (name, desc )>
<!ELEMENT name (#PCDATA)>
<!ELEMENT desc ANY>
Example (DTD for Courses with Flexible Description)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!DOCTYPE courses SYSTEM ” course . dtd ”>
<courses>
<course>
<name>OOP</name>
<desc>
<name>object−oriented</name>
<desc>programming</ desc>.
</ desc>
</ course>
</ courses>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 29 / 37
A University Example, Setup
Example (DTD)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!ELEMENT u n i v e r s i t y ( courses ,
students ,
follows )>
<!ELEMENT courses ( course )+>
<!ELEMENT course (name)>
<! ATTLIST course cid ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT students ( student )+>
<!ELEMENT student ( fname )>
<! ATTLIST student sid ID #REQUIRED>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT follows ( takes )+>
<!ELEMENT takes EMPTY>
<! ATTLIST takes sid IDREF #REQUIRED>
<! ATTLIST takes cids IDREFS #REQUIRED>
Example (XML Fragment)
<u n i v e r s i t y>
<courses>
<course cid= ’C111 ’>
<name>DB</name>
</ course>
<course cid= ’C222 ’>
<name>OOP</name>
</ course>
</ courses>
<students>
<student sid= ’S11 ’>
<fname>Ann</ fname>
</ student>
<student sid= ’S22 ’>
<fname>Bart</ fname>
</ student>
<student sid= ’S33 ’>
<fname>Curt</ fname>
</ student>
</ students>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 30 / 37
A University Example, Referencing
Example (DTD)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!ELEMENT u n i v e r s i t y ( courses ,
students ,
follows )>
<!ELEMENT courses ( course )+>
<!ELEMENT course (name)>
<! ATTLIST course cid ID #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT students ( student )+>
<!ELEMENT student ( fname )>
<! ATTLIST student sid ID #REQUIRED>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT follows ( takes )+>
<!ELEMENT takes EMPTY>
<! ATTLIST takes sid IDREF #REQUIRED>
<! ATTLIST takes cids IDREFS #REQUIRED>
Example (XML Fragment)
<follows>
<takes sid= ’S11 ’ cids= ’C111 C222 ’ />
<takes sid= ’S22 ’ cids= ’C222 ’ />
<takes sid= ’S33 ’ cids= ’C111 ’ />
</ follows>
Note
ID cannot start with digit
sid is a single ID
cids is a set of IDs
No overlap between IDs
Separator is space (not ,)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 31 / 37
Quiz: IDREFS
Example (University XML)
<u n i v e r s i t y>
<courses>
<course cid= ’C111 ’>
<name>DB</name>
</ course>
<course cid= ’C222 ’>
<name>OOP</name>
</ course>
</ courses>
<students>
<student sid= ’S11 ’>
<fname>Ann</ fname>
</ student>
<student sid= ’S22 ’>
<fname>Bart</ fname>
</ student>
<student sid= ’S33 ’>
<fname>Curt</ fname>
</ student>
</ students>
Example (Allowed One?)
<follows>
<takes sid= ’S11 ’ cids= ’C111 C222 C111 ’ />
</ follows>
Example (Allowed Two?)
<follows>
<takes sid= ’S11 ’ cids= ’C333 C222 C111 ’ />
</ follows>
Example (Allowed Three?)
<follows>
<takes sid= ’S11 ’ cids= ’C111 ’ />
<takes sid= ’S11 ’ cids= ’C222 ’ />
</ follows>
Example (Allowed Four?)
<follows>
<takes sid= ’S11 ’ cids= ’ ’ />
<takes sid= ’S22 ’ cids= ’ c111 ’ />
</ follows>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 32 / 37
Using an Internal DTD
Example (DTD for Courses with Flexible Description)
<?xml version=” 1.0 ” standalone=” yes ” ?>
<!DOCTYPE courses [
<!ELEMENT courses ( course )∗>
<!ELEMENT course (name, desc )>
<!ELEMENT name (#PCDATA)>
<!ELEMENT desc ANY>
]>
<courses>
<course>
<name>OOP</name>
<desc>
<name>object−oriented</name>
<desc>programming</ desc>.
</ desc>
</ course>
</ courses>
Note
Benefit: All information in one file
Drawback: DTD is not reused (maintenance nightmare)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 33 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 34 / 37
Summary: DTD
Limitations
Only very basic data types supported
Only single-column keys (for uniqueness)
Uniqueness only guaranteed within a single document
Very limited support for integrity constraints
Note
DTD is widely used
DTD is being replaced by XML Schema when documents are complex
There are problems using XML Namespace and DTD
Advise
Never build a new DTD if an existing (standard) can be used!
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 35 / 37
RDBMS vs. XML
RDBMS vs. XML
Query Schema
SQL DML DDL
XML XQuery DTD/XML Schema
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 36 / 37
Summary: DTD versus XML Schema
DTD
Own format
Compact notation
Simple data types
From SGML
Support entities
No support namespaces
XML Schema
XML format
Very verbose
Advanced data types
Invented for XML
Does not support entities
Support namespaces
Advice
Start with a DTD
Move on to XML Schema for later iterations
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 37 / 37

Introduction to DTD

  • 1.
    Introduction to DTD KristianTorp Department of Computer Science Aalborg University people.cs.aau.dk/˜torp torp@cs.aau.dk November 3, 2015 daisy.aau.dk Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 1 / 37
  • 2.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 2 / 37
  • 3.
    Learning Outcomes Learning Outcomes Beable to read and understand a DTD Be able to construct a DTD for a set of existing XML documents Be able to validate an XML document against a DTD Know the limitations of a DTD Database Focus All XML technologies are presented from a database perspective! Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 3 / 37
  • 4.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 4 / 37
  • 5.
    Example: Course CatalogXML Document User Requirements Make a DTD for the course catalog Use the DTD to validate our course catalog XML document Example (Current Courses) <?xml version=” 1.0 ” ?> <coursecatalog> <course cid= ’P4 ’> <name>OOP</name> <semester>3</ semester> <desc>Object−oriented programming</ desc> </ course> <course cid= ’P2 ’> <name>DB</name> <semester>7</ semester> <desc>Databases including SQL</ desc> </ course> </ coursecatalog> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 5 / 37
  • 6.
    Example: Course CatalogDTD Example (DTD for Course Catalog) <?xml version=” 1.0 ” encoding=”UTF−8” ?> <!ELEMENT coursecatalog ( course )∗> <!ELEMENT course (name, semester , desc ) > <!ELEMENT name (#PCDATA)> <!ELEMENT semester (#PCDATA)> <!ELEMENT desc (#PCDATA)> <! ATTLIST course cid ID #REQUIRED> Informal Description A course catalog consists of zero or more of courses A course consists of a name, a semester, and a description It is identified by an ID that is required A (course) name is a string (leaf in XML document) A semester is a string (leaf in XML document) A description is a string (leaf in XML document) Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 6 / 37
  • 7.
    Overview Purpose Define the documentstructure Legal elements and attributes Serves the same purpose as a create table statement in SQL Structure and type of data Integrity constraints! Left over from SGML Is not written in XML If this is a requirement then use XML Schema Still very widely used Because much simpler than XML Schema Note Many simple errors can be found using a DTD A necessity if receiving XML documents from external sources Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 7 / 37
  • 8.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 8 / 37
  • 9.
    Simplest Entity Example (ElementDeclaration) <!ELEMENT name (#PCDATA)> Example (Allowed Values) <name>Hello Element</name> <name/> <name><![CDATA[ select ∗ from emp where sal > 10]]></name> Example (Illegal Values) <name>> </name> <name>&gt;</name> <name><it>Hello</it></name> Unknown element <it>, must be defined in DTD Note Root, internal-node, and leafs in XML tree representation Terminal and non-terminal in grammar terminology Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 9 / 37
  • 10.
    Sequences of ChildElements Example (Element Declaration) <!ELEMENT course (name, semester, desc)> Example (Allowed XML Fragment, Why?) <course> <name>OOP</name> <semester>7</ semester> <desc>I n t r o d u c t i o n to OOP</ desc> </ course> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
  • 11.
    Sequences of ChildElements Example (Element Declaration) <!ELEMENT course (name, semester, desc)> Example (Allowed XML Fragment, Why?) <course> <name>OOP</name> <semester>7</ semester> <desc>I n t r o d u c t i o n to OOP</ desc> </ course> Example (Disallowed XML Fragment, Why?) <course> <semester>7</ semester> <name>OOP</name> </ course> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
  • 12.
    Sequences of ChildElements Example (Element Declaration) <!ELEMENT course (name, semester, desc)> Example (Allowed XML Fragment, Why?) <course> <name>OOP</name> <semester>7</ semester> <desc>I n t r o d u c t i o n to OOP</ desc> </ course> Example (Disallowed XML Fragment, Why?) <course> <semester>7</ semester> <name>OOP</name> </ course> Example (Is this allowed?) <course></ course> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
  • 13.
    Choice Among ChildElements Example (Element Declaration) <!ELEMENT circle (x, y, (radius | diameter))> Example (Allowed XML Fragment) < c i r c l e> <x>5</ x> <y>9</ y> <diameter>7</ diameter> </ c i r c l e> Example (Illegal XML Fragment) < c i r c l e> <x>4</ x> <y>8</ y> <radius>3.5</ radius> <diameter>7</ diameter> </ c i r c l e> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 11 / 37
  • 14.
    Symbols in aDTD Symbols Symbol Example ∗ <!ELEMENT coursecatalog (course)∗> + <!ELEMENT coursecatalog (course)+> ? <!ELEMENT coursecatalog (course)?> , <!ELEMENT course (name, semester, desc) > | <!ELEMENT course (name | semester | desc) > Note Symbols are mostly taken from regular expressions Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 12 / 37
  • 15.
    Mixed Content Example (DataCentric) <!ELEMENT coor ( x , y )> Example (Allowed Fragment) <coor> <x>5</ x> <y>9</ y> </ coor> Example (Mixed Content) <!ELEMENT coor ( x , y , #PCDATA)∗> Example (Allowed Fragment) <coor> This i s the coordinate (<x>5</ x> , <y>9</ y>) where the treasure i s hidden ! </ coor> Note Data centric very table like Mixed content also called narrative document Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 13 / 37
  • 16.
    Element Declarations usingANY Example (Any) <!ELEMENT coor (ANY)> <!ELEMENT x (#PCDATA) <!ELEMENT y (#PCDATA) Example (Allowed Fragments) <coor/> <coor>Hello World</coor> <coor>Hello <x>1</x><x/>World<y>3</y><y>4</y></coor> <coor>Hello <x>1</x><y>2</y>World<y>3</y><x>4</x></coor> Example (Illegal Fragments) <coor><z>1</z></coor> <coor><x>1</x><y>1<y/><z>1</z></coor> Note ANY handy for narrative documents, e.g., HTML Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 14 / 37
  • 17.
    Element Declarations usingEMPTY Example (Empty) <!ELEMENT coor EMPTY> Example (Allowed?) <coor></coor> <coor/> <coor>Hello</coor> <coor><x>Hello</x></coor> <coor> </coor> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 15 / 37
  • 18.
    Summary: Elements Repetition Symbol ExplanationExample ? zero-or-one <!ELEMENT person (address?)> * zero-or-more <!ELEMENT person (address∗)> + one-or-more <!ELEMENT person (address+)> once <!ELEMENT person (address)> Sequence or Choice Symbol Explanation Example , Sequence <!ELEMENT coor (x, y)> | Choice <!ELEMENT coor (x | y)> Data Type Symbol Explanation Example #PCDATA String <!ELEMENT name (#PCDATA)> ANY What ever <!ELEMENT coor (ANY)> EMPTY Empty <!ELEMENT room EMPTY> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 16 / 37
  • 19.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 17 / 37
  • 20.
    Attribute Declarations Example (Circles) <?xmlversion= ’ 1.0 ’ encoding= ’ utf −8 ’?> <!ELEMENT drawing ( c i r c l e )∗> <!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )> <! ATTLIST c i r c l e cid ID #REQUIRED name CDATA #IMPLIED > <!ELEMENT x (#PCDATA)> <!ELEMENT y (#PCDATA)> <!ELEMENT radius (#PCDATA)> <! ATTLIST radius u n i t (mm|cm |m) ”m”> <!−− Enum with default −−> <!ELEMENT diameter (#PCDATA)> <! ATTLIST diameter u n i t (mm|cm |m) #REQUIRED> <!−− Enum no default −−> Note Mandatory and optional attributes One or more attributes Enumeration with defaults Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 18 / 37
  • 21.
    Example Document Example (Circles) <?xmlversion=” 1.0 ” encoding= ’UTF−8 ’?> <!DOCTYPE drawing SYSTEM ” c i r c l e a t t . dtd ”> <drawing> < c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’> <x>8</ x> <y>8</ y> <radius>4</ radius> <!−− default u n i t−−> </ c i r c l e> < c i r c l e cid= ’C2 ’> <!−− name not required −−> <x>5</ x> <y>5</ y> <radius u n i t =”cm”>4</ radius> <!−− e x p l i c i t u n i t−−> </ c i r c l e> </ drawing> Note Unique value is not an integer Used that attribute name is optional in element circle Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 19 / 37
  • 22.
    Uniqueness, Examples Example (Circle/Pointswith IDs) <?xml version= ’ 1.0 ’ encoding= ’ utf −8 ’?> <!ELEMENT drawing ( point | c i r c l e )∗> <!ELEMENT point ( x , y )> <!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )> <! ATTLIST c i r c l e did ID #REQUIRED> <! ATTLIST point did ID #REQUIRED> Example (Circles) <drawing> < c i r c l e did= ’C1 ’> <x>8</ x> <y>8</ y> <radius>4</ radius> </ c i r c l e> <point did= ’P2 ’> <x>5</ x> <y>5</ y> </ point> </ drawing> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 20 / 37
  • 23.
    Uniqueness, Errors Example (Findthe error 1!) <drawing> <c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’> <x>8</ x> <y>8</ y> <radius>5</ radius> </ c i r c l e> <c i r c l e cid= ’C1 ’> <x>5</ x> <y>5</ y> <radius u n i t =”cm”>8</ radius> </ c i r c l e> </ drawing> Example (Find the error 2!) <drawing> <c i r c l e did= ’C11 ’> <x>8</ x> <y>8</ y> <radius>4</ radius> </ c i r c l e> <point did= ’C11 ’> <x>5</ x> <y>5</ y> </ point> </ drawing> Example (Find the error 3!) <drawing> <c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’> <x>8</ x> <y>8</ y> <radius>5</ radius> </ c i r c l e> <c i r c l e cid= ’2C ’> <x>5</ x> <y>5</ y> <radius u n i t =”cm”>8</ radius> </ c i r c l e> </ drawing> Example (Find the error 4!) <drawing> <c i r c l e did= ’C11 ’> <x>8</ x> <y>8</ y> <radius>4</ radius> </ c i r c l e> <point did= ’ C1111111111111111111111 <x>5</ x> <y>5</ y> </ point> </ drawing> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 21 / 37
  • 24.
    Uniqueness Limitations Only attribute valuesunique not element values Cannot be a integer, e.g., <circle did=’1’> not allowed Only unique within a single document Uniqueness not guaranteed across multiple documents Only a single attribute uniqueness (no composite keys) Combination of x and y coordinates cannot be declared unique Note Uniqueness quite restrictive compared to DBMS technology XML Schema lifts most limitations on uniqueness Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 22 / 37
  • 25.
    Empty Elements withAttributes Example (Empty) <!ELEMENT coor EMPTY> <! ATTLIST coor cid ID #REQUIRED x CDATA #REQUIRED y CDATA #REQUIRED z CDATA #IMPLIED> Example (Allowed?) <coor/> <coor cid=’c1’ x=’1’ y=’1’ z=’1’/> <coor cid=’c2’ x=’2’ y=’2’></coor> <coor cid=’c3’ x=’3’ y=’3’> </coor> <coor cid=’c4’ z=’4’ y=’4’ x=’4’/> <coor z=’5’ y=’5’ x=’5’/> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 23 / 37
  • 26.
    Is something Wrong? Example(Case 1) <!ELEMENT coor EMPTY> <! ATTLIST coor cid ID x CDATA #REQUIRED> Example (Case 2!) <!ELEMENT coor EMPTY> <! ATTLIST coor cid ID #IMPLIED x CDATA #REQUIRED> Example (Case 3!) <!ELEMENT coor EMPTY> <! ATTLIST coor x CDATA #REQUIRED cid ID #REQUIRED> Example (Case 4) <!ELEMENT coor EMPTY> <! ATTLIST coor cid ID ’ 42 ’ x CDATA #REQUIRED> Example (Case 5) <!ELEMENT coor (EMPTY)> <! ATTLIST coor cid ID #REQUIRED x CDATA #REQUIRED> Example (Case 6) <!ELEMENT coor EMPTY> <! ATTLIST coor cid ID #REQUIRED x ID #REQUIRED> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 24 / 37
  • 27.
    Summary: Attributes General Syntax <!ATTLISTelement−name attribute−name type [DefaultValue]> Often used types Type Example CDATA <!ATTLIST course id CDATA> ID <!ATTLIST course id ID #REQUIRED> Enumeration <!ATTLIST course id (OOP | DB)> Defaults Type Example #REQUIRED <!ATTLIST course id ID #REQUIRED> #IMPLIED <!ATTLIST course id CDATA #IMPLIED> #FIXED <!ATTLIST course id CDATA #FIXED ”1”> A value <!ATTLIST course id (OOP | DB) ”DB”> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 25 / 37
  • 28.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 26 / 37
  • 29.
    A Buggy DTD Example(DTD With Five Errors) <?xml version= ’ 1.0 ’> <!ELEMENT users user+> <!ELEMENT user ( firstname , lastname> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname> Two-Minutes Exercise With your neighbor identify the errors in the DTD Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
  • 30.
    A Buggy DTD Example(DTD With Five Errors) <?xml version= ’ 1.0 ’> <!ELEMENT users user+> <!ELEMENT user ( firstname , lastname> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname> Two-Minutes Exercise With your neighbor identify the errors in the DTD Example (The Corrected DTD) <?xml version= ’ 1.0 ’ encoding= ’ utf −8 ’?> <!ELEMENT users ( user )+> <!ELEMENT user ( firstname , lastname )> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
  • 31.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 28 / 37
  • 32.
    Uncertain About Content Example(DTD for Courses with Flexible Description) <?xml version=” 1.0 ” encoding=”UTF−8” ?> <!ELEMENT courses ( course )∗> <!ELEMENT course (name, desc )> <!ELEMENT name (#PCDATA)> <!ELEMENT desc ANY> Example (DTD for Courses with Flexible Description) <?xml version=” 1.0 ” encoding=”UTF−8” ?> <!DOCTYPE courses SYSTEM ” course . dtd ”> <courses> <course> <name>OOP</name> <desc> <name>object−oriented</name> <desc>programming</ desc>. </ desc> </ course> </ courses> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 29 / 37
  • 33.
    A University Example,Setup Example (DTD) <?xml version=” 1.0 ” encoding=”UTF−8” ?> <!ELEMENT u n i v e r s i t y ( courses , students , follows )> <!ELEMENT courses ( course )+> <!ELEMENT course (name)> <! ATTLIST course cid ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT students ( student )+> <!ELEMENT student ( fname )> <! ATTLIST student sid ID #REQUIRED> <!ELEMENT fname (#PCDATA)> <!ELEMENT follows ( takes )+> <!ELEMENT takes EMPTY> <! ATTLIST takes sid IDREF #REQUIRED> <! ATTLIST takes cids IDREFS #REQUIRED> Example (XML Fragment) <u n i v e r s i t y> <courses> <course cid= ’C111 ’> <name>DB</name> </ course> <course cid= ’C222 ’> <name>OOP</name> </ course> </ courses> <students> <student sid= ’S11 ’> <fname>Ann</ fname> </ student> <student sid= ’S22 ’> <fname>Bart</ fname> </ student> <student sid= ’S33 ’> <fname>Curt</ fname> </ student> </ students> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 30 / 37
  • 34.
    A University Example,Referencing Example (DTD) <?xml version=” 1.0 ” encoding=”UTF−8” ?> <!ELEMENT u n i v e r s i t y ( courses , students , follows )> <!ELEMENT courses ( course )+> <!ELEMENT course (name)> <! ATTLIST course cid ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT students ( student )+> <!ELEMENT student ( fname )> <! ATTLIST student sid ID #REQUIRED> <!ELEMENT fname (#PCDATA)> <!ELEMENT follows ( takes )+> <!ELEMENT takes EMPTY> <! ATTLIST takes sid IDREF #REQUIRED> <! ATTLIST takes cids IDREFS #REQUIRED> Example (XML Fragment) <follows> <takes sid= ’S11 ’ cids= ’C111 C222 ’ /> <takes sid= ’S22 ’ cids= ’C222 ’ /> <takes sid= ’S33 ’ cids= ’C111 ’ /> </ follows> Note ID cannot start with digit sid is a single ID cids is a set of IDs No overlap between IDs Separator is space (not ,) Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 31 / 37
  • 35.
    Quiz: IDREFS Example (UniversityXML) <u n i v e r s i t y> <courses> <course cid= ’C111 ’> <name>DB</name> </ course> <course cid= ’C222 ’> <name>OOP</name> </ course> </ courses> <students> <student sid= ’S11 ’> <fname>Ann</ fname> </ student> <student sid= ’S22 ’> <fname>Bart</ fname> </ student> <student sid= ’S33 ’> <fname>Curt</ fname> </ student> </ students> Example (Allowed One?) <follows> <takes sid= ’S11 ’ cids= ’C111 C222 C111 ’ /> </ follows> Example (Allowed Two?) <follows> <takes sid= ’S11 ’ cids= ’C333 C222 C111 ’ /> </ follows> Example (Allowed Three?) <follows> <takes sid= ’S11 ’ cids= ’C111 ’ /> <takes sid= ’S11 ’ cids= ’C222 ’ /> </ follows> Example (Allowed Four?) <follows> <takes sid= ’S11 ’ cids= ’ ’ /> <takes sid= ’S22 ’ cids= ’ c111 ’ /> </ follows> Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 32 / 37
  • 36.
    Using an InternalDTD Example (DTD for Courses with Flexible Description) <?xml version=” 1.0 ” standalone=” yes ” ?> <!DOCTYPE courses [ <!ELEMENT courses ( course )∗> <!ELEMENT course (name, desc )> <!ELEMENT name (#PCDATA)> <!ELEMENT desc ANY> ]> <courses> <course> <name>OOP</name> <desc> <name>object−oriented</name> <desc>programming</ desc>. </ desc> </ course> </ courses> Note Benefit: All information in one file Drawback: DTD is not reused (maintenance nightmare) Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 33 / 37
  • 37.
    Outline 1 Introduction 2 Elements 3Attributes 4 DTD Find Errors 5 Putting it All Together 6 Summary Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 34 / 37
  • 38.
    Summary: DTD Limitations Only verybasic data types supported Only single-column keys (for uniqueness) Uniqueness only guaranteed within a single document Very limited support for integrity constraints Note DTD is widely used DTD is being replaced by XML Schema when documents are complex There are problems using XML Namespace and DTD Advise Never build a new DTD if an existing (standard) can be used! Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 35 / 37
  • 39.
    RDBMS vs. XML RDBMSvs. XML Query Schema SQL DML DDL XML XQuery DTD/XML Schema Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 36 / 37
  • 40.
    Summary: DTD versusXML Schema DTD Own format Compact notation Simple data types From SGML Support entities No support namespaces XML Schema XML format Very verbose Advanced data types Invented for XML Does not support entities Support namespaces Advice Start with a DTD Move on to XML Schema for later iterations Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 37 / 37