VALIDATINGAN XMLDOCUMENT
part 2
Agenda
• Validating a Document
• DTD
• Declaring Elements
• Declaring Attributes
VALIDATINGADOCUMENT
You validate documents to make certain necessary elements are never omitted.
For example, each customer order should include a customer name, address,
and phone number.
• Some elements and attributes may be optional, for example an e-mail
address.
• An XML document can be validated using either DTDs (Document Type
Definitions) or schemas.
WELL-FORMEDVS.VALID XMLDOCUMENTS
• An XML document is well-formed if it contains no syntax errors and fulfills
all of the specifications for XML code as defined by the W3C.
• An XML document is valid if it is well-formed and also satisfies the rules
laid out in the DTD or schema attached to the document.
DocumentTypeDefinitions (DTD)
• A DTD is a collection of rules or declarations that define the content and
structure of the document.
• A DTD attaches those rules to the document’s content.
• There can only be one DTD per XML document.
Why use a DTD ?
A DTD can be used to:
–Ensure all required elements are present in the document
–Prevent undefined elements from being used
–Enforce a specific data structure
–Specify the use of attributes and define their possible values
–Define default values for attributes
DECLARINGADTD
An internal DTD is declarations placed in the same file as the document
content. The DOCTYPE declaration for an internal subset is:
<!DOCTYPE root
[
declarations
]>
Where root is the name of the document’s root element, declarations are the
statements that comprise the DTD.
DECLARING A DTD
Example for internal DTD:
<!DOCTYPE customers
[
declarations
]>
DECLARINGADTD
An external DTD is located in a separate file. The DOCTYPE declaration for
an external subset is:
<!DOCTYPE root SYSTEM “uri”>
Where root is the name of the document’s root element, and uri is the filename
of the external subset.
Example:
<!DOCTYPE customers SYSTEM "rules.dtd">
DECLARINGADTD
If you place the DTD within the document, it is easier to compare the DTD to
the document’s content. However, the real power of XML comes from an
external DTD that can be shared among many documents written by
different authors.
DECLARINGADTD
If a document contains both an internal and an external subset, the internal
subset takes precedence over the external subset if there is a conflict
between the two.
This way, the external subset would define basic rules for all the documents,
and the internal subset would define those rules specific to each document.
COMBININGAN EXTERNALAND INTERNAL
DTD SUBSET
Internal DTD
* notice the CDATA section used to insert the address
DECLARING ELEMENTS
DECLARING DOCUMENT ELEMENTS
Every element used in the document must be declared in the DTD for the
document to be valid.
An element type declaration specifies the name of the element and indicates
what kind of content the element can contain.
DECLARING DOCUMENT ELEMENTS
The element declaration syntax is:
<!ELEMENT element content-model>
Where element is the element name and content-model specifies what type of
content the element contains.
* The element name is case sensitive.
TYPES OF CONTENT
DTDs define five different types of content:
1– “ANY” elements. No restrictions on the element’s content.
2– “EMPTY” elements. The element cannot store any content.
3– #PCDATA. The element can only contain text.
4– Elements. The element can only contain child elements.
1- “ANY” content
ANY content: The declared element can store any type of content. The syntax is:
<!ELEMENT element ANY>
ex: if I declare “student” as: <!ELEMENT student ANY>
then I can write in my xml file:
<students>
<student> Maha <student> here the student’s content is #PCDATA
<student>
<name> Sara </name>
</student> here the student’s content is a child element
</students>
2– “EMPTY” content
EMPTY content: This is reserved for elements that store no content. The syntax is:
<!ELEMENT element EMPTY>
ex: if I declare “note” as empty: <!ELEMENT note EMPTY>
then I write the following in my xml file:
<note/>
• Attempting to add content to an empty element would result in XML parsers
rejecting the document as invalid.
3– #PCDATAcontent
Parsed Character Data content: These elements can only contain text strings. The
syntax is:
<!ELEMENT element (#PCDATA)>
• The keyword #PCDATA stands for “parsed-character data” and is any well-formed
text string.
3– #PCDATAcontent
• Example :
if I declare “student” as
<!ELEMENT student (#PCDATA)>
then my xml file would contain:
<student> Lama Alharthi </student>
3– #PCDATAcontent
4- Child Contents
• Element content: The syntax for declaring that elements contain only child
elements is:
<!ELEMENT element (child)> here the element has only one child
• The syntax for a sequence of child elements is:
<!ELEMENT element (child1, child2, …)> here the
element has a sequence of children
Where child is a child element.
• The order of the child elements must match the order defined in the element
declaration.
4- Childcontents
• The declaration
<!ELEMENT customer (phone)>
indicated that the customer element can only have one child, named
phone. You cannot repeat the same child element more than once within
the declaration.
• For example:
<customer>
<phone> 12345 </phone>
</customer>
4- Childcontents
• For example:
<!ELEMENT customer (name, phone, email)>
indicated that the customer element should contain three child elements for
each customer.
<customer>
<name> Ahmed </name>
<phone> 567890 </phone>
<email> Ahmed@hotmail.com </email>
</customer>
4- Childcontents: Choice
Choice is the other way to list child elements and present a set of possible child
elements. The syntax is:
<!ELEMENT element (child1 | child2 | …)>
where child1, child2, etc. are the possible child elements of the parent element.
4- Childcontents: Choice
For example,
<!ELEMENT customer (name | company)>
This allows the customer element to contain either the name element or the company element.
However, you cannot have both the company and the name child elements.
<customers> <customers>
<customer> <customer>
<name> Ahmed </name> OR <company>SQ</company>
</customer> </customer>
</customers> </customers>
4- Childcontents: Sequence & Choice
<!ELEMENT customer ( (name | company) , phone , email )>
Here we have two options:
<customer>
<name> Ahmed </name> OR
<phone> 6666 </phone>
<email> Ahmed@hotmail</email>
<customer>
Notice: we cannot write elements name and company together, we only have to
choose one of them.
<customer>
<company> sabec </company>
<phone> 4444</phone>
<email> info@sabec </email>
<customer>
What if we need more than one occurrence of the same element?
<!ELEMENT customers (customer, customer)>
<!ELEMENT customers (customer, customer, customer)>
<!ELEMENT customers (customer, customer, customer, customer)>
the answer is using modifying symbols to indicate the number of occurrences of a
child element...
MODIFYING SYMBOLS
Modifying symbols are symbols appended to the content model to indicate the number
of occurrences of each element. There are three modifying symbols:
–a question mark (?), allow zero or one of the item.
–a plus sign (+), allow one or more of the item ( at least one).
–an asterisk (*), allow zero or more of the item.
MODIFYING SYMBOLS
For example, <!ELEMENT customers (customer+)> would allow the document to
contain one or more customer elements to be placed within the customer element.
Modifying symbols can be applied within sequences or choices.
<!ELEMENT customer ( name, address, phone, email? )>
MODIFYING SYMBOLS
They can also modify entire element sequences or choices by placing the character
outside the closing parenthesis of the sequence or choice.
<!ELEMENT order (orderDate, items)+>
<order>
<orderDate> 12/12/09 </orderDate>
<items> bread, juice, milk </items>
</order>
<order>
<orderDate> 12/12/09 </orderDate>
<items> bread, juice, milk </items>
<orderDate> 25/12/09 </orderDate>
<items> eggs, milk, bread </items>
</order>
MODIFYING SYMBOLS
They can also modify entire element sequences or choices by placing the character
outside the closing parenthesis of the choice.
<!ELEMENT customer (name | company)+>
+ means: at least one element from the choice list must appear at least once
<customer>
<name> Ahmed </name>
</customer>
<customer>
<company> Sabec </company>
</customer>
<customer>
<name> Ahmed </name>
<company> Sabec </company>
</customer>
<customer>
<name> Ahmed </name>
<name> Sara </name>
</customer>
THE STRUCTURE OF KRISTEN’S DOCUMENT
<!DOCTYPE customers
[ <!ELEMENT customers (customer+)>
<!ELEMENT customer ( name, address, phone,
email?, orders)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT orders (order+)>
<!ELEMENT order ( orderDate, items)>
<!ELEMENT orderDate (#PCDATA)>
<!ELEMENT items (item+)>
<!ELEMENT item (#PCDATA)>
]>
<customers>
<customer>
<name>Ahmed</name>
<address>14 Bronson st.</address>
<phone>12345</phone>
<orders>
<order>
<orderDate>12/12/09</orderDate>
<items>
<item>>bread </item>
</items>
</order>
<order>
<orderDate> 22/01/10</orderDate>
<items>
<item> eggs </item>
<item> milk </item>
</items>
</order>
</orders>
</customer>
<customer>
…………….
</customer>
</customers>
DECLARING
ATTRIBUTES
DECLARINGATTRIBUTES
• For a document to be valid, all the attributes associated with elements must
also be declared. To enforce attribution properties, you must add an
attribute-list declaration to the document’s DTD.
• The attribute-list declaration :
• –Lists the names of all attributes associated with a specific element
• –Specifies the datatype of the attribute
• –Indicates whether the attribute is required or optional
DECLARINGATTRIBUTES
The syntax to declare a list of attributes is:
<!ATTLIST element attribute1 type1 default1
attribute2 type2 default2
attribute3 type3 default3>
Where element is the name of the element associated with the attributes,
attribute is the name of an attribute, type is the attribute’s data type, and
default indicates whether the attribute is required or implied,and whether it
has a fixed or default value.
DECLARINGATTRIBUTES
Another syntax:
<!ATTLIST element attribute1 type1 default1>
<!ATTLIST element attribute2 type2 default2>
<!ATTLIST element attribute3 type3 default3>
Where element is the name of the element associated with the attributes,
attribute is the name of an attribute, type is the attribute’s data type, and
default indicates whether the attribute is required or implied,and whether it
has a fixed or default value.
DECLARINGATTRIBUTES
• Attribute-list declaration can be placed anywhere within the document type
declaration(DTD), although it is easier if they are located adjacent to the
declaration for the element with which they are associated.
ATTRIBUTETYPES
While all attribute types are text strings, you can control the type of text used
with the attribute. There are three general categories of attribute values:
1. CDATA
2. Enumerated
3. ID
WORKING WITHATTRIBUTETYPES:
1- CDATA
• CDATA attributes can contain any character data (text, numbers, symbols)
except reserved XML characters ( <, >, &).
• The general form of a CDATA type is:
<!ATTLIST element attribute CDATA default >
• For example:
<!ATTLIST item itemPrice CDATA ...>
• Any of the following attribute values are allowed under this declaration:
<item itemPrice="29.95"> ... </item>
<item itemPrice="$29.95"> ... </item>
<item itemPrice="£29.95"> ... </item>
WORKING WITHATTRIBUTETYPES:
2- EnumeratedTypes
• Enumerated types are attributes that are limited to a set of possible values.
• The general form of an enumerated type is:
<!ATTLIST element attribute (value1  value2  value3 …) default >
• For example, the following declaration:
<!ATTLIST customer custType (home  business ) ...>
restricts custType to either “home” or “business”
WORKING WITHATTRIBUTETYPES
3- ID
• The ID is used with attributes that require unique values. For example, if a
customer ID needs to be unique, you may use the ID token:
<!ATTLIST customer custID ID …>
• This ensures each customer will have a unique ID.
<customer custID = “123”> …. </customer>
• However, the following elements would not be valid because the same
custID value is used more than once:
<customer custID="Cust021"> ... </customer>
<customer custID="Cust021"> ... </customer>
ATTRIBUTE DEFAULTS
The final part of an attribute declaration is the attribute default. There are two possible
defaults:
– #REQUIRED: the attribute must appear with every occurrence of the element.
– #IMPLIED: The attribute is optional.
• The #REQUIRED value to the attribute declaration:
<!ATTLIST customer custID ID #REQUIRED>
• The #IMPLIED value for the custType attribute to indicate that use of this attribute is
optional:
<!ATTLIST customer custType (home | business) #IMPLIED>

Xml part2

  • 1.
  • 2.
    Agenda • Validating aDocument • DTD • Declaring Elements • Declaring Attributes
  • 3.
    VALIDATINGADOCUMENT You validate documentsto make certain necessary elements are never omitted. For example, each customer order should include a customer name, address, and phone number. • Some elements and attributes may be optional, for example an e-mail address. • An XML document can be validated using either DTDs (Document Type Definitions) or schemas.
  • 4.
    WELL-FORMEDVS.VALID XMLDOCUMENTS • AnXML document is well-formed if it contains no syntax errors and fulfills all of the specifications for XML code as defined by the W3C. • An XML document is valid if it is well-formed and also satisfies the rules laid out in the DTD or schema attached to the document.
  • 5.
    DocumentTypeDefinitions (DTD) • ADTD is a collection of rules or declarations that define the content and structure of the document. • A DTD attaches those rules to the document’s content. • There can only be one DTD per XML document.
  • 6.
    Why use aDTD ? A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce a specific data structure –Specify the use of attributes and define their possible values –Define default values for attributes
  • 7.
    DECLARINGADTD An internal DTDis declarations placed in the same file as the document content. The DOCTYPE declaration for an internal subset is: <!DOCTYPE root [ declarations ]> Where root is the name of the document’s root element, declarations are the statements that comprise the DTD.
  • 8.
    DECLARING A DTD Examplefor internal DTD: <!DOCTYPE customers [ declarations ]>
  • 9.
    DECLARINGADTD An external DTDis located in a separate file. The DOCTYPE declaration for an external subset is: <!DOCTYPE root SYSTEM “uri”> Where root is the name of the document’s root element, and uri is the filename of the external subset. Example: <!DOCTYPE customers SYSTEM "rules.dtd">
  • 10.
    DECLARINGADTD If you placethe DTD within the document, it is easier to compare the DTD to the document’s content. However, the real power of XML comes from an external DTD that can be shared among many documents written by different authors.
  • 11.
    DECLARINGADTD If a documentcontains both an internal and an external subset, the internal subset takes precedence over the external subset if there is a conflict between the two. This way, the external subset would define basic rules for all the documents, and the internal subset would define those rules specific to each document.
  • 12.
  • 13.
    Internal DTD * noticethe CDATA section used to insert the address
  • 14.
  • 15.
    DECLARING DOCUMENT ELEMENTS Everyelement used in the document must be declared in the DTD for the document to be valid. An element type declaration specifies the name of the element and indicates what kind of content the element can contain.
  • 16.
    DECLARING DOCUMENT ELEMENTS Theelement declaration syntax is: <!ELEMENT element content-model> Where element is the element name and content-model specifies what type of content the element contains. * The element name is case sensitive.
  • 17.
    TYPES OF CONTENT DTDsdefine five different types of content: 1– “ANY” elements. No restrictions on the element’s content. 2– “EMPTY” elements. The element cannot store any content. 3– #PCDATA. The element can only contain text. 4– Elements. The element can only contain child elements.
  • 18.
    1- “ANY” content ANYcontent: The declared element can store any type of content. The syntax is: <!ELEMENT element ANY> ex: if I declare “student” as: <!ELEMENT student ANY> then I can write in my xml file: <students> <student> Maha <student> here the student’s content is #PCDATA <student> <name> Sara </name> </student> here the student’s content is a child element </students>
  • 19.
    2– “EMPTY” content EMPTYcontent: This is reserved for elements that store no content. The syntax is: <!ELEMENT element EMPTY> ex: if I declare “note” as empty: <!ELEMENT note EMPTY> then I write the following in my xml file: <note/> • Attempting to add content to an empty element would result in XML parsers rejecting the document as invalid.
  • 20.
    3– #PCDATAcontent Parsed CharacterData content: These elements can only contain text strings. The syntax is: <!ELEMENT element (#PCDATA)> • The keyword #PCDATA stands for “parsed-character data” and is any well-formed text string.
  • 21.
    3– #PCDATAcontent • Example: if I declare “student” as <!ELEMENT student (#PCDATA)> then my xml file would contain: <student> Lama Alharthi </student>
  • 22.
  • 23.
    4- Child Contents •Element content: The syntax for declaring that elements contain only child elements is: <!ELEMENT element (child)> here the element has only one child • The syntax for a sequence of child elements is: <!ELEMENT element (child1, child2, …)> here the element has a sequence of children Where child is a child element. • The order of the child elements must match the order defined in the element declaration.
  • 24.
    4- Childcontents • Thedeclaration <!ELEMENT customer (phone)> indicated that the customer element can only have one child, named phone. You cannot repeat the same child element more than once within the declaration. • For example: <customer> <phone> 12345 </phone> </customer>
  • 25.
    4- Childcontents • Forexample: <!ELEMENT customer (name, phone, email)> indicated that the customer element should contain three child elements for each customer. <customer> <name> Ahmed </name> <phone> 567890 </phone> <email> Ahmed@hotmail.com </email> </customer>
  • 26.
    4- Childcontents: Choice Choiceis the other way to list child elements and present a set of possible child elements. The syntax is: <!ELEMENT element (child1 | child2 | …)> where child1, child2, etc. are the possible child elements of the parent element.
  • 27.
    4- Childcontents: Choice Forexample, <!ELEMENT customer (name | company)> This allows the customer element to contain either the name element or the company element. However, you cannot have both the company and the name child elements. <customers> <customers> <customer> <customer> <name> Ahmed </name> OR <company>SQ</company> </customer> </customer> </customers> </customers>
  • 28.
    4- Childcontents: Sequence& Choice <!ELEMENT customer ( (name | company) , phone , email )> Here we have two options: <customer> <name> Ahmed </name> OR <phone> 6666 </phone> <email> Ahmed@hotmail</email> <customer> Notice: we cannot write elements name and company together, we only have to choose one of them. <customer> <company> sabec </company> <phone> 4444</phone> <email> info@sabec </email> <customer>
  • 29.
    What if weneed more than one occurrence of the same element? <!ELEMENT customers (customer, customer)> <!ELEMENT customers (customer, customer, customer)> <!ELEMENT customers (customer, customer, customer, customer)> the answer is using modifying symbols to indicate the number of occurrences of a child element...
  • 30.
    MODIFYING SYMBOLS Modifying symbolsare symbols appended to the content model to indicate the number of occurrences of each element. There are three modifying symbols: –a question mark (?), allow zero or one of the item. –a plus sign (+), allow one or more of the item ( at least one). –an asterisk (*), allow zero or more of the item.
  • 31.
    MODIFYING SYMBOLS For example,<!ELEMENT customers (customer+)> would allow the document to contain one or more customer elements to be placed within the customer element. Modifying symbols can be applied within sequences or choices. <!ELEMENT customer ( name, address, phone, email? )>
  • 32.
    MODIFYING SYMBOLS They canalso modify entire element sequences or choices by placing the character outside the closing parenthesis of the sequence or choice. <!ELEMENT order (orderDate, items)+> <order> <orderDate> 12/12/09 </orderDate> <items> bread, juice, milk </items> </order> <order> <orderDate> 12/12/09 </orderDate> <items> bread, juice, milk </items> <orderDate> 25/12/09 </orderDate> <items> eggs, milk, bread </items> </order>
  • 33.
    MODIFYING SYMBOLS They canalso modify entire element sequences or choices by placing the character outside the closing parenthesis of the choice. <!ELEMENT customer (name | company)+> + means: at least one element from the choice list must appear at least once <customer> <name> Ahmed </name> </customer> <customer> <company> Sabec </company> </customer> <customer> <name> Ahmed </name> <company> Sabec </company> </customer> <customer> <name> Ahmed </name> <name> Sara </name> </customer>
  • 34.
    THE STRUCTURE OFKRISTEN’S DOCUMENT
  • 35.
    <!DOCTYPE customers [ <!ELEMENTcustomers (customer+)> <!ELEMENT customer ( name, address, phone, email?, orders)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT orders (order+)> <!ELEMENT order ( orderDate, items)> <!ELEMENT orderDate (#PCDATA)> <!ELEMENT items (item+)> <!ELEMENT item (#PCDATA)> ]> <customers> <customer> <name>Ahmed</name> <address>14 Bronson st.</address> <phone>12345</phone> <orders> <order> <orderDate>12/12/09</orderDate> <items> <item>>bread </item> </items> </order> <order> <orderDate> 22/01/10</orderDate> <items> <item> eggs </item> <item> milk </item> </items> </order> </orders> </customer> <customer> ……………. </customer> </customers>
  • 36.
  • 37.
    DECLARINGATTRIBUTES • For adocument to be valid, all the attributes associated with elements must also be declared. To enforce attribution properties, you must add an attribute-list declaration to the document’s DTD. • The attribute-list declaration : • –Lists the names of all attributes associated with a specific element • –Specifies the datatype of the attribute • –Indicates whether the attribute is required or optional
  • 38.
    DECLARINGATTRIBUTES The syntax todeclare a list of attributes is: <!ATTLIST element attribute1 type1 default1 attribute2 type2 default2 attribute3 type3 default3> Where element is the name of the element associated with the attributes, attribute is the name of an attribute, type is the attribute’s data type, and default indicates whether the attribute is required or implied,and whether it has a fixed or default value.
  • 39.
    DECLARINGATTRIBUTES Another syntax: <!ATTLIST elementattribute1 type1 default1> <!ATTLIST element attribute2 type2 default2> <!ATTLIST element attribute3 type3 default3> Where element is the name of the element associated with the attributes, attribute is the name of an attribute, type is the attribute’s data type, and default indicates whether the attribute is required or implied,and whether it has a fixed or default value.
  • 40.
    DECLARINGATTRIBUTES • Attribute-list declarationcan be placed anywhere within the document type declaration(DTD), although it is easier if they are located adjacent to the declaration for the element with which they are associated.
  • 41.
    ATTRIBUTETYPES While all attributetypes are text strings, you can control the type of text used with the attribute. There are three general categories of attribute values: 1. CDATA 2. Enumerated 3. ID
  • 42.
    WORKING WITHATTRIBUTETYPES: 1- CDATA •CDATA attributes can contain any character data (text, numbers, symbols) except reserved XML characters ( <, >, &). • The general form of a CDATA type is: <!ATTLIST element attribute CDATA default > • For example: <!ATTLIST item itemPrice CDATA ...> • Any of the following attribute values are allowed under this declaration: <item itemPrice="29.95"> ... </item> <item itemPrice="$29.95"> ... </item> <item itemPrice="£29.95"> ... </item>
  • 43.
    WORKING WITHATTRIBUTETYPES: 2- EnumeratedTypes •Enumerated types are attributes that are limited to a set of possible values. • The general form of an enumerated type is: <!ATTLIST element attribute (value1 value2 value3 …) default > • For example, the following declaration: <!ATTLIST customer custType (home business ) ...> restricts custType to either “home” or “business”
  • 44.
    WORKING WITHATTRIBUTETYPES 3- ID •The ID is used with attributes that require unique values. For example, if a customer ID needs to be unique, you may use the ID token: <!ATTLIST customer custID ID …> • This ensures each customer will have a unique ID. <customer custID = “123”> …. </customer> • However, the following elements would not be valid because the same custID value is used more than once: <customer custID="Cust021"> ... </customer> <customer custID="Cust021"> ... </customer>
  • 45.
    ATTRIBUTE DEFAULTS The finalpart of an attribute declaration is the attribute default. There are two possible defaults: – #REQUIRED: the attribute must appear with every occurrence of the element. – #IMPLIED: The attribute is optional. • The #REQUIRED value to the attribute declaration: <!ATTLIST customer custID ID #REQUIRED> • The #IMPLIED value for the custType attribute to indicate that use of this attribute is optional: <!ATTLIST customer custType (home | business) #IMPLIED>