Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. DTD(Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs) 1
  2. 2. Motivation• A DTD adds syntactical requirements in addition to the well-formed requirement• It helps in eliminating errors when creating or editing XML documents• It clarifies the intended semantics• It simplifies the processing of XML documents 2
  3. 3. An Example• In an address book, where can a phone number appear? – Under <person>, under <name> or under both?• If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs 3
  4. 4. Document Type Definitions• Document Type Definitions (DTDs) impose structure on XML documents• There is some relationship between a DTD and a schema, but it is not close – hence the need for additional “typing” systems (XML schemas)• The DTD is a syntactic specification 4
  5. 5. Example: An Address Book<person> <name> Homer Simpson </name> Exactly one name <greet> Dr. H. Simpson </greet> At most one greeting <addr>1234 Springwater Road </addr> As many address lines as needed <addr> Springfield USA, 98765 </addr> (in order) <tel> (321) 786 2543 </tel> Mixed telephones <fax> (321) 786 2544 </fax> and faxes <tel> (321) 786 2544 </tel> As many <email> homer@math.springfield.edu </email> as needed</person> 5
  6. 6. Specifying the Structure• name to specify a name element• greet? to specify an optional (0 or 1) greet elements• name, greet? to specify a name followed by an optional greet 6
  7. 7. Specifying the Structure (cont’d)• addr* to specify 0 or more address lines• tel | fax a tel or a fax element• (tel | fax)* 0 or more repeats of tel or fax• email* 0 or more email elements 7
  8. 8. Specifying the Structure (cont’d)• So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email*• This is known as a regular expression 8
  9. 9. Element Type Definition• for each element type E, a declaration of the form:• <!ELEMENT E P>• where P is a regular expression, i.e.,• P ::= EMPTY | ANY | #PCDATA | E’ |• P1, P2 | P1 | P2 | P? | P+ | P* – E’: element type – P1 , P2: concatenation – P1 | P2: disjunction – P?: optional – P+: one or more occurrences – P*: the Kleene closure 9
  10. 10. Summary of Regular Expressions• A The tag (i.e., element) A occurs• e1,e2 The expression e1 followed by e2• e* 0 or more occurrences of e• e? Optional: 0 or 1 occurrences• e+ 1 or more occurrences• e1 | e2 either e1 or e2• (e) grouping 10
  11. 11. The Definition of an Element Consists of Exactly One of the Following • A regular expression (as defined earlier) • EMPTY means that the element has no content • ANY means that content can be any mixture of PCDATA and elements defined in the DTD • Mixed content which is defined as described on the next slide • (#PCDATA) 11
  12. 12. The Definition of Mixed Content• Mixed content is described by a repeatable OR group (#PCDATA | element-name | …)* – Inside the group, no regular expressions – just element names – #PCDATA must be first followed by 0 or more element names, separated by | – The group can be repeated 0 or more times 12
  13. 13. An Address-Book XML Document with an Internal DTD<?xml version="1.0" encoding="UTF-8"?> The name of<!DOCTYPE addressbook [ the DTD is <!ELEMENT addressbook (person*)> addressbook <!ELEMENT person (name, greet?, address*, (fax | tel)*, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT greet (#PCDATA)> The syntax <!ELEMENT address (#PCDATA)> of a DTD is <!ELEMENT tel (#PCDATA)> not XML <!ELEMENT fax (#PCDATA)> syntax <!ELEMENT email (#PCDATA)>]> “Internal” means that the DTD and the XML Document are in the same file 13
  14. 14. The Rest of theAddress-Book XML Document<addressbook> <person> <name> Jeff Cohen </name> <greet> Dr. Cohen </greet> <email> jc@penny.com </email> </person></addressbook> 14
  15. 15. Regular Expressions• Each regular expression determines acorresponding finite-state automaton• Let’s start with a simpler example: A double circle name, addr*, email denotes an addr accepting state name email This suggests a simple parsing program 15
  16. 16. Another Examplename,address*,(tel | fax)*,email* address tel email tel name email fax fax email 16
  17. 17. Some Things are Hard to SpecifyEach employee element should contain name,age and ssn elements in some order<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )>Suppose that there were many more fields! 17
  18. 18. Some Things are Hard to Specify (cont’d)<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )> There are n! different orders of n elementsSuppose there were many more fields! It is not even polynomial 18
  19. 19. Specifying Attributes in the DTD<!ELEMENT height (#PCDATA)><!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED >The dimension attribute is requiredThe accuracy attribute is optionalCDATA is the “type” of the attribute – it means“character data,” and may take any literal stringas a value 19
  20. 20. The Format of an Attribute Definition• <!ATTLIST element-name attr-name attr-type default-value>• The default value is given inside quotes• attribute types: – CDATA – ID, IDREF, IDREFS –… 20
  21. 21. Summary of Attribute Default Values• #REQUIRED means that the attribute must by included in the element• #IMPLIED• #FIXED “value” – The given value (inside quotes) is the only possible one• “value” – The default value of the attribute if none is given 21
  22. 22. Recursive DTDs<DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> Each person <!ELEMENT person ( should have name, a father and a dateOfBirth, mother. This person, -- mother leads to either person )> -- father infinite data or ... a person that]> is a descendentWhat is the problem with this? of herself.A parser does not notice it! 22
  23. 23. Recursive DTDs (cont’d)<DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> If a person only <!ELEMENT person ( has a father, name, how can you dateOfBirth, tell that he has person?, -- mother a father and person? )> -- father does not have ... a mother?]>What is now the problem with this? 23
  24. 24. Using ID and IDREF Attributes <!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED>]> 24
  25. 25. IDs and IDREFs• ID attribute: unique within the entire document. – An element can have at most one ID attribute. – No default (fixed default) value is allowed. • #required: a value must be provided • #implied: a value is optional• IDREF attribute: its value must be some other element’s ID value in the document.• IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document. <person id=“898” father=“332” mother=“336” children=“982 984 986”> 25
  26. 26. Some Conforming Data<family> <person id=“lisa” mother=“marge” father=“homer”> <name> Lisa Simpson </name> </person> <person id=“bart” mother=“marge” father=“homer”> <name> Bart Simpson </name> </person> <person id=“marge” children=“bart lisa”> <name> Marge Simpson </name> </person> <person id=“homer” children=“bart lisa”> <name> Homer Simpson </name> </person></family> 26
  27. 27. ID References do not Have Types• The attributes mother and father are references to IDs of other elements• However, those are not necessarily person elements!• The mother attribute is not necessarily a reference to a female person 27
  28. 28. An Alternative Specification<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name, mother?, father?, children?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT mother EMPTY> <!ATTLIST mother idref IDREF #REQUIRED> <!ELEMENT father EMPTY> <!ATTLIST father idref IDREF #REQUIRED> <!ELEMENT children EMPTY> <!ATTLIST children idrefs IDREFS #REQUIRED>]> 28
  29. 29. The Revised Data<family> <person id="bart"> <person id="marge"> <name> Bart <name> Marge Simpson </name> Simpson </name> <mother idref="marge"/> <children idrefs="bart lisa"/> <father idref="homer"/> </person> </person> <person id="homer"> <person id="lisa"> <name> Homer <name> Lisa Simpson </name> Simpson </name> <children idrefs="bart lisa"/> <mother idref="marge"/> </person> <father idref="homer"/> </person> </family> 29
  30. 30. Consistency of ID and IDREF Attribute Values• If an attribute is declared as ID – The associated value must be distinct, i.e., different elements (in the given document) must have different values for the ID attribute (no confusion) • Even if the two elements have different element names• If an attribute is declared as IDREF – The associated value must exist as the value of some ID attribute (no dangling “pointers”)• Similarly for all the values of an IDREFS attribute• ID, IDREF and IDREFS attributes are not typed 30
  31. 31. Adding a DTD to the Document• A DTD can be internal – The DTD is part of the document file• or external – The DTD and the document are on separate files – An external DTD may reside • In the local file system (where the document is) • In a remote file system 31
  32. 32. Connecting a Document with its DTD• An internal DTD: <?xml version="1.0"?> <!DOCTYPE db [<!ELEMENT ...> … ]> <db> ... </db>• A DTD from the local file system: <!DOCTYPE db SYSTEM "schema.dtd">• A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd"> 32
  33. 33. Well-Formed XML Documents• An XML document (with or without a DTD) is well-formed if – Tags are syntactically correct – Every tag has an end tag – Tags are properly nested An XML document must be well formed – There is a root tag – A start tag does not have two occurrences of the same attribute 33
  34. 34. Valid Documents• A well-formed XML document isvalid if it conforms to its DTD, that is, – The document conforms to the regular- expression grammar, – The types of attributes are correct, and – The constraints on references are satisfied 34