Introduction to DTD

Introduction to DTD
Kristian Torp
Department of Computer Science
Aalborg University
people.cs.aau.dk/˜torp
torp@cs.aau.dk
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 1 / 37

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary

Learning Outcomes
Learning Outcomes
Be able to read and understand a DTD
Be able to construct a DTD for a set of existing XML documents
Be able to validate an XML document against a DTD
Know the limitations of a DTD
Database Focus
All XML technologies are presented from a database perspective!

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
6 Summary

Example: Course Catalog XML Document
User Requirements
Make a DTD for the course catalog
Use the DTD to validate our course catalog XML document
Example (Current Courses)
<?xml version=” 1.0 ” ?>
<coursecatalog>
<course cid= ’P4 ’>
<name>OOP</name>
<semester>3</ semester>
<desc>Object−oriented programming</ desc>
</ course>
<course cid= ’P2 ’>
<name>DB</name>
<desc>Databases including SQL</ desc>
</ course>
</ coursecatalog>

Example: Course Catalog DTD
Example (DTD for Course Catalog)
<?xml version=” 1.0 ” encoding=”UTF−8” ?>
<!ELEMENT coursecatalog ( course )∗>
<!ELEMENT course (name, semester , desc ) >
<!ELEMENT name (#PCDATA)>
<!ELEMENT semester (#PCDATA)>
<!ELEMENT desc (#PCDATA)>
<! ATTLIST course cid ID #REQUIRED>
Informal Description
A course catalog consists of zero or more of courses
A course consists of a name, a semester, and a description
It is identiﬁed by an ID that is required
A (course) name is a string (leaf in XML document)
A semester is a string (leaf in XML document)
A description is a string (leaf in XML document)

Overview
Purpose
Deﬁne the document structure
Legal elements and attributes
Serves the same purpose as a create table statement in SQL
Structure and type of data
Integrity constraints!
Left over from SGML
Is not written in XML
If this is a requirement then use XML Schema
Still very widely used
Because much simpler than XML Schema
Note
Many simple errors can be found using a DTD
A necessity if receiving XML documents from external sources

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
6 Summary

Simplest Entity
Example (Element Declaration)
Example (Allowed Values)
<name>Hello Element</name>
<name/>
<name><![CDATA[ select ∗ from emp where sal > 10]]></name>
Example (Illegal Values)
<name>> </name>
<name>></name>
<name><it>Hello</it></name>
Unknown element <it>, must be deﬁned in DTD
Note
Root, internal-node, and leafs in XML tree representation
Terminal and non-terminal in grammar terminology

Sequences of Child Elements
<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)
<course>
<name>OOP</name>
<desc>I n t r o d u c t i o n to OOP</ desc>
</ course>

<course>
<name>OOP</name>
</ course>
Example (Disallowed XML Fragment, Why?)
<course>
<name>OOP</name>
</ course>

<course>
<name>OOP</name>
</ course>
Example (Disallowed XML Fragment, Why?)
<course>
<name>OOP</name>
</ course>
Example (Is this allowed?)
<course></ course>

Choice Among Child Elements
<!ELEMENT circle (x, y, (radius | diameter))>
Example (Allowed XML Fragment)
< c i r c l e>
<x>5</ x>
<y>9</ y>
<diameter>7</ diameter>
</ c i r c l e>
Example (Illegal XML Fragment)
< c i r c l e>
<x>4</ x>
<y>8</ y>
<radius>3.5</ radius>
<diameter>7</ diameter>
</ c i r c l e>

Symbols in a DTD
Symbols
Symbol Example
∗ <!ELEMENT coursecatalog (course)∗>
+ <!ELEMENT coursecatalog (course)+>
? <!ELEMENT coursecatalog (course)?>
, <!ELEMENT course (name, semester, desc) >
| <!ELEMENT course (name | semester | desc) >
Note
Symbols are mostly taken from regular expressions

Mixed Content
Example (Data Centric)
<!ELEMENT coor ( x , y )>
Example (Allowed Fragment)
<coor>
<x>5</ x>
<y>9</ y>
</ coor>
Example (Mixed Content)
<!ELEMENT coor ( x , y , #PCDATA)∗>
Example (Allowed Fragment)
<coor>
This i s the coordinate
(<x>5</ x> , <y>9</ y>) where
the treasure i s hidden !
</ coor>
Note
Data centric very table like
Mixed content also called narrative document

Element Declarations using ANY
Example (Any)
<!ELEMENT coor (ANY)>
<!ELEMENT x (#PCDATA)
<!ELEMENT y (#PCDATA)
Example (Allowed Fragments)
<coor/>
<coor>Hello World</coor>
<coor>Hello <x>1</x><x/>World<y>3</y><y>4</y></coor>
<coor>Hello <x>1</x><y>2</y>World<y>3</y><x>4</x></coor>
Example (Illegal Fragments)
<coor><z>1</z></coor>
<coor><x>1</x><y>1<y/><z>1</z></coor>
Note
ANY handy for narrative documents, e.g., HTML

Element Declarations using EMPTY
Example (Empty)
<!ELEMENT coor EMPTY>
Example (Allowed?)
<coor></coor>
<coor/>
<coor>Hello</coor>
<coor><x>Hello</x></coor>
<coor> </coor>

Summary: Elements
Repetition
Symbol Explanation Example
? zero-or-one <!ELEMENT person (address?)>
* zero-or-more <!ELEMENT person (address∗)>
+ one-or-more <!ELEMENT person (address+)>
once <!ELEMENT person (address)>
Sequence or Choice
, Sequence <!ELEMENT coor (x, y)>
| Choice <!ELEMENT coor (x | y)>
Data Type
#PCDATA String <!ELEMENT name (#PCDATA)>
ANY What ever <!ELEMENT coor (ANY)>
EMPTY Empty <!ELEMENT room EMPTY>

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
6 Summary

Attribute Declarations
Example (Circles)
<?xml version= ’ 1.0 ’ encoding= ’ utf −8 ’?>
<!ELEMENT drawing ( c i r c l e )∗>
<!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )>
<! ATTLIST c i r c l e cid ID #REQUIRED
name CDATA #IMPLIED >
<!ELEMENT x (#PCDATA)>
<!ELEMENT y (#PCDATA)>
<!ELEMENT radius (#PCDATA)>
<! ATTLIST radius u n i t (mm|cm |m) ”m”> <!−− Enum with default −−>
<!ELEMENT diameter (#PCDATA)>
<! ATTLIST diameter u n i t (mm|cm |m) #REQUIRED> <!−− Enum no default −−>
Note
Mandatory and optional attributes
One or more attributes
Enumeration with defaults

Example Document
Example (Circles)
<?xml version=” 1.0 ” encoding= ’UTF−8 ’?>
<!DOCTYPE drawing SYSTEM ” c i r c l e a t t . dtd ”>
<drawing>
< c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius> <!−− default u n i t−−>
</ c i r c l e>
< c i r c l e cid= ’C2 ’> <!−− name not required −−>
<x>5</ x> <y>5</ y>
<radius u n i t =”cm”>4</ radius> <!−− e x p l i c i t u n i t−−>
</ c i r c l e>
</ drawing>
Note
Unique value is not an integer
Used that attribute name is optional in element circle

Uniqueness, Examples
Example (Circle/Points with IDs)
<!ELEMENT drawing ( point | c i r c l e )∗>
<!ELEMENT point ( x , y )>
<!ELEMENT c i r c l e ( x , y , ( radius | diameter ) )>
<! ATTLIST c i r c l e did ID #REQUIRED>
<! ATTLIST point did ID #REQUIRED>
Example (Circles)
<drawing>
< c i r c l e did= ’C1 ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= ’P2 ’>
<x>5</ x> <y>5</ y>
</ point>
</ drawing>

Uniqueness, Errors
Example (Find the error 1!)
<drawing>
<c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’>
<x>8</ x> <y>8</ y>
<radius>5</ radius>
</ c i r c l e>
<c i r c l e cid= ’C1 ’>
<x>5</ x> <y>5</ y>
<radius u n i t =”cm”>8</ radius>
</ c i r c l e>
</ drawing>
<drawing>
<c i r c l e did= ’C11 ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= ’C11 ’>
<x>5</ x> <y>5</ y>
</ point>
</ drawing>
<drawing>
<c i r c l e cid= ’C1 ’ name= ’ f o r e s t ’>
<x>8</ x> <y>8</ y>
<radius>5</ radius>
</ c i r c l e>
<c i r c l e cid= ’2C ’>
<x>5</ x> <y>5</ y>
<radius u n i t =”cm”>8</ radius>
</ c i r c l e>
</ drawing>
<drawing>
<c i r c l e did= ’C11 ’>
<x>8</ x> <y>8</ y>
<radius>4</ radius>
</ c i r c l e>
<point did= ’ C1111111111111111111111
<x>5</ x> <y>5</ y>
</ point>
</ drawing>

Uniqueness
Limitations
Only attribute values unique not element values
Cannot be a integer, e.g., <circle did=’1’> not allowed
Only unique within a single document
Uniqueness not guaranteed across multiple documents
Only a single attribute uniqueness (no composite keys)
Combination of x and y coordinates cannot be declared unique
Note
Uniqueness quite restrictive compared to DBMS technology
XML Schema lifts most limitations on uniqueness

Empty Elements with Attributes
Example (Empty)
<! ATTLIST coor cid ID #REQUIRED
x CDATA #REQUIRED
y CDATA #REQUIRED
z CDATA #IMPLIED>
Example (Allowed?)
<coor/>
<coor cid=’c1’ x=’1’ y=’1’ z=’1’/>
<coor cid=’c2’ x=’2’ y=’2’></coor>
<coor cid=’c3’ x=’3’ y=’3’> </coor>
<coor cid=’c4’ z=’4’ y=’4’ x=’4’/>
<coor z=’5’ y=’5’ x=’5’/>

Is something Wrong?
Example (Case 1)
<! ATTLIST coor
cid ID
x CDATA #REQUIRED>
Example (Case 2!)
<! ATTLIST coor
cid ID #IMPLIED
x CDATA #REQUIRED>
Example (Case 3!)
<! ATTLIST coor
x CDATA #REQUIRED
cid ID #REQUIRED>
Example (Case 4)
<! ATTLIST coor
cid ID ’ 42 ’
x CDATA #REQUIRED>
Example (Case 5)
<!ELEMENT coor (EMPTY)>
<! ATTLIST coor
cid ID #REQUIRED
x CDATA #REQUIRED>
Example (Case 6)
<! ATTLIST coor
cid ID #REQUIRED
x ID #REQUIRED>

Summary: Attributes
General Syntax
<!ATTLIST element−name attribute−name type [DefaultValue]>
Often used types
Type Example
CDATA <!ATTLIST course id CDATA>
ID <!ATTLIST course id ID #REQUIRED>
Enumeration <!ATTLIST course id (OOP | DB)>
Defaults
Type Example
#REQUIRED <!ATTLIST course id ID #REQUIRED>
#IMPLIED <!ATTLIST course id CDATA #IMPLIED>
#FIXED <!ATTLIST course id CDATA #FIXED ”1”>
A value <!ATTLIST course id (OOP | DB) ”DB”>

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
6 Summary

A Buggy DTD
Example (DTD With Five Errors)
<?xml version= ’ 1.0 ’>
<!ELEMENT users user+>
<!ELEMENT user ( firstname , lastname>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname>
Two-Minutes Exercise
With your neighbor identify the errors in the DTD

A Buggy DTD
Example (DTD With Five Errors)
<?xml version= ’ 1.0 ’>
<!ELEMENT users user+>
<!ELEMENT user ( firstname , lastname>
<!ELEMENT lastname>
Two-Minutes Exercise
With your neighbor identify the errors in the DTD
Example (The Corrected DTD)
<!ELEMENT users ( user )+>
<!ELEMENT user ( firstname , lastname )>
<!ELEMENT lastname (#PCDATA)>

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
6 Summary

Uncertain About Content
Example (DTD for Courses with Flexible Description)
<!ELEMENT courses ( course )∗>
<!ELEMENT course (name, desc )>
<!ELEMENT desc ANY>
<!DOCTYPE courses SYSTEM ” course . dtd ”>
<courses>
<course>
<name>OOP</name>
<desc>
<name>object−oriented</name>
<desc>programming</ desc>.
</ desc>
</ course>
</ courses>

A University Example, Setup
Example (DTD)
<!ELEMENT u n i v e r s i t y ( courses ,
students ,
follows )>
<!ELEMENT courses ( course )+>
<!ELEMENT course (name)>
<!ELEMENT students ( student )+>
<!ELEMENT student ( fname )>
<! ATTLIST student sid ID #REQUIRED>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT follows ( takes )+>
<!ELEMENT takes EMPTY>
<! ATTLIST takes sid IDREF #REQUIRED>
<! ATTLIST takes cids IDREFS #REQUIRED>
Example (XML Fragment)
<u n i v e r s i t y>
<courses>
<course cid= ’C111 ’>
<name>DB</name>
</ course>
<name>OOP</name>
</ course>
</ courses>
<students>
<student sid= ’S11 ’>
<fname>Ann</ fname>
</ student>
<fname>Bart</ fname>
</ student>
<fname>Curt</ fname>
</ student>
</ students>

A University Example, Referencing
Example (DTD)
<!ELEMENT u n i v e r s i t y ( courses ,
students ,
follows )>
<!ELEMENT courses ( course )+>
<!ELEMENT course (name)>
<!ELEMENT students ( student )+>
<!ELEMENT student ( fname )>
<! ATTLIST student sid ID #REQUIRED>
<!ELEMENT fname (#PCDATA)>
<!ELEMENT follows ( takes )+>
<!ELEMENT takes EMPTY>
<! ATTLIST takes sid IDREF #REQUIRED>
<! ATTLIST takes cids IDREFS #REQUIRED>
Example (XML Fragment)
<follows>
<takes sid= ’S11 ’ cids= ’C111 C222 ’ />
<takes sid= ’S22 ’ cids= ’C222 ’ />
</ follows>
Note
ID cannot start with digit
sid is a single ID
cids is a set of IDs
No overlap between IDs
Separator is space (not ,)

Quiz: IDREFS
Example (University XML)
<u n i v e r s i t y>
<courses>
<name>DB</name>
</ course>
<name>OOP</name>
</ course>
</ courses>
<students>
<fname>Ann</ fname>
</ student>
<fname>Bart</ fname>
</ student>
<fname>Curt</ fname>
</ student>
</ students>
Example (Allowed One?)
<follows>
<takes sid= ’S11 ’ cids= ’C111 C222 C111 ’ />
</ follows>
Example (Allowed Two?)
<follows>
<takes sid= ’S11 ’ cids= ’C333 C222 C111 ’ />
</ follows>
Example (Allowed Three?)
<follows>
</ follows>
Example (Allowed Four?)
<follows>
<takes sid= ’S11 ’ cids= ’ ’ />
<takes sid= ’S22 ’ cids= ’ c111 ’ />
</ follows>

Using an Internal DTD
<?xml version=” 1.0 ” standalone=” yes ” ?>
<!DOCTYPE courses [
<!ELEMENT courses ( course )∗>
<!ELEMENT course (name, desc )>
<!ELEMENT desc ANY>
]>
<courses>
<course>
<name>OOP</name>
<desc>
<name>object−oriented</name>
<desc>programming</ desc>.
</ desc>
</ course>
</ courses>
Note
Beneﬁt: All information in one ﬁle
Drawback: DTD is not reused (maintenance nightmare)

Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
6 Summary

Summary: DTD
Limitations
Only very basic data types supported
Only single-column keys (for uniqueness)
Uniqueness only guaranteed within a single document
Very limited support for integrity constraints
Note
DTD is widely used
DTD is being replaced by XML Schema when documents are complex
There are problems using XML Namespace and DTD
Advise
Never build a new DTD if an existing (standard) can be used!

RDBMS vs. XML
RDBMS vs. XML
Query Schema
SQL DML DDL
XML XQuery DTD/XML Schema

Summary: DTD versus XML Schema
DTD
Own format
Compact notation
Simple data types
From SGML
Support entities
No support namespaces
XML Schema
XML format
Very verbose
Advanced data types
Invented for XML
Does not support entities
Support namespaces
Advice
Start with a DTD
Move on to XML Schema for later iterations

Introduction to DTD

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to DTD

Similar to Introduction to DTD (20)

More from torp42

More from torp42 (6)

Recently uploaded

Recently uploaded (20)

Introduction to DTD