3. 3 www.ExigenServices.com
XML
• Is a standart for EXtensible Markup Language
• Is a markup language much like HTML
• XML was designed to describe data
• XML tags are not predefined. You must define your own tags
• XML uses a Document Type Definition (DTD) or an
XML Schema to describe the data
5. 5 www.ExigenServices.com
XML SYNTAX
• XML declaration (standart and encoding)
• Can be comment (is inserted in tags <!-- comment -->)
• Element
• Child elements
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--DOCTYPE note SYSTEM "InternalNote.dtd“-->
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML Example:
6. 6 www.ExigenServices.com
XML TAGS
• All XML documents must have closing a tag
<element> this is an element </element>
<element1>this is another element </element1>
• XML tags are case sensitive
<!– correct -->
<element> this is an element </element>
<!– wrong -->
<element1>this is another element </Element1>
• All XML documents must be properly nested
<!– correct -->
<b><i> this text is bold and italic </i></b>
<!– wrong -->
<b><i> this text is bold and italic </b></i>
8. 8 www.ExigenServices.com
XML ELEMENTS NAMING
• XML elements have the different content type
<book>
<title>BOOK TITLE</title>
<prod id="33-657"
media="paper"></prod>
<chapter>CHAPTER ONE
<para>paragraph 1</para>
<para>paragraph 2</para>
</chapter>
<chapter>CHAPTER TWO
<para>paragraph 1</para>
</chapter>
</book>
9. 9 www.ExigenServices.com
XML ATTRIBUTES
• Attribute values must always be enclosed in quotes
<person sex="female">
• If the attribute value itself contains double quotes it is necessary
to use single quotes
or <person sex=‘female’>
<gangster name='George "Shotgun" Ziegler'>
• If the attribute value itself contains single quotes it is necessary
to use double quotes
<gangster name=“George ‘Shotgun’ Ziegler”>
10. 10 www.ExigenServices.com
XML ATTRIBUTES
• The following example defines an element called "age" with a restriction.
The value of age cannot be lower than 0 or greater than 120:
<xs:element name="age">
< xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
< /xs:simpleType>
< /xs:element>
11. 11 www.ExigenServices.com
DTD SCHEME
• Attribute’s type and property
<!ATTLIST message
number CDATA #REQUIRED
date CDATA #REQUIRED
from CDATA #FIXED
status CDATA #IMPLIED>
• Elements and attributes should be described in DTD in order they used in
XML
• DTD can be defined in XML in second line
<?xml version="1.0" ?>
<!DOCTYPE mailbox SYSTEM "mailbox.dtd">
12. 12 www.ExigenServices.com
DTD EXAMPLE
<!ELEMENT mailbox (message*)>
<!ELEMENT message (head, body)>
<!ATTLIST message uid CDATA #REQUIRED>
<!ELEMENT head ( from,to+, subject?, CC*,
notify?) >
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT CC (#PCDATA)>
<!ELEMENT notify EMPTY>
<!ELEMENT body (#PCDATA)>
<?xml version="1.0" ?>
<!DOCTYPE mailbox SYSTEM
"mailbox.dtd">
<mailbox>
<message uid="1">
<head>
<from>user1@myhp.edu</from>
<to>user2@myhp.edu</to>
<subject>Re:</subject>
</head>
<body> What's up! </body>
</message>
<message uid="2">
…
</message>
</mailbox>
DTD XML
14. 14 www.ExigenServices.com
XML tools
• XML on-line validators
http://www.w3schools.com/dom/dom_validate.asp
• XML validators
XML ValidatorBuddy
• XML tools:
• Stylus Studio
• XML Notepad
• XML Copy Editor
• Exchanger XML Lite
• XMLPad
• XML Spy
15. 15 www.ExigenServices.com
RESOLUTION
• XML was designed to carry data.
• XML has it’s own syntax
• XML should has a scheme
• XML document contains elements and attributes
• XML document has tags
• XML document is easy to create and validate
16. 16 www.ExigenServices.com
<?xml version="1.0" encoding="WINDOWS-1251"?>
<?xml-stylesheet type='text/xsl' href='ex01-1.xsl'?>
<tutorial>
<title>“Introduction to XML"</title>
<author>Bob Hitch</author>
</toturial>
WRONG OR CORRECT
<?xml version="1.0" encoding="UTF-16"?> will give an error message
Elements can be simple and complex. Simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. However, the "only text" restriction is quite misleading. The text can be of many different types. It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself. You can also add restrictions (facets) to a data type in order to limit its content, or you can require the data to match a specific pattern. The syntax for defining a simple element is: < xs:element name="xxx" type="yyy"/> where xxx is the name of the element and yyy is the data type of the element. XML Schema has a lot of built-in data types. The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time Example Here are some XML elements: < lastname>Refsnes</lastname> < age>36</age> < dateborn>1970-03-27</dateborn> And here are the corresponding simple element definitions: < xs:element name="lastname" type="xs:string"/> < xs:element name="age" type="xs:integer"/> < xs:element name="dateborn" type="xs:date"/> Simple elements may have a default value OR a fixed value specified. A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red": < xs:element name="color" type="xs:string" default="red"/> A fixed value is also automatically assigned to the element, and you cannot specify another value. In the following example the fixed value is "red": < xs:element name="color" type="xs:string" fixed="red"/> A complex element is an XML element that contains other elements and/or attributes. There are four kinds of complex elements: empty elements elements that contain only other elements elements that contain only text elements that contain both other elements and text Examples of Complex Elements A complex XML element, "product", which is empty: < product pid="1345"/> A complex XML element, "employee", which contains only other elements: < employee> < firstname>John</firstname> < lastname>Smith</lastname> < /employee> A complex XML element, "food", which contains only text: < food type="dessert">Ice cream</food> A complex XML element, "description", which contains both elements and text: < description> It happened on <date lang="norwegian">03.03.99</date> .... < /description> How to Define a Complex Element Look at this complex XML element, "employee", which contains only other elements: < employee> < firstname>John</firstname> < lastname>Smith</lastname> < /employee> We can define a complex element in an XML Schema two different ways: 1. The "employee" element can be declared directly by naming the element, like this: < xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> < /xs:element> If you use the method described above, only the "employee" element can use the specified complex type. Note that the child elements, "firstname" and "lastname", are surrounded by the <sequence> indicator. This means that the child elements must appear in the same order as they are declared. You will learn more about indicators in the XSD Indicators chapter. 2. The "employee" element can have a type attribute that refers to the name of the complex type to use: < xs:element name="employee" type="personinfo"/> < xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> < /xs:complexType> If you use the method described above, several elements can refer to the same complex type, like this: < xs:element name="employee" type="personinfo"/> < xs:element name="student" type="personinfo"/> < xs:element name="member" type="personinfo"/> < xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> < /xs:complexType> You can also base a complex element on an existing complex element and add some elements, like this: < xs:element name="employee" type="fullpersoninfo"/> < xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> < /xs:complexType> < xs:complexType name="fullpersoninfo"> <xs:complexContent> <xs:extension base="personinfo"> <xs:sequence> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:extension> </xs:complexContent> < /xs:complexType> Complex Types Containing Elements Only An XML element, "person", that contains only other elements: < person> < firstname>John</firstname> < lastname>Smith</lastname> < /person> You can define the "person" element in a schema, like this: < xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> < /xs:element> Notice the <xs:sequence> tag. It means that the elements defined ("firstname" and "lastname") must appear in that order inside a "person" element. Or you can give the complexType element a name, and let the "person" element have a type attribute that refers to the name of the complexType (if you use this method, several elements can refer to the same complex type): < xs:element name="person" type="persontype"/> < xs:complexType name="persontype"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> < /xs:complexType> Complex Types with Mixed Content An XML element, "letter", that contains both text and other elements: < letter> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. < /letter> The following schema declares the "letter" element: < xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:positiveInteger"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> < /xs:element> Note: To enable character data to appear between the child-elements of "letter", the mixed attribute must be set to "true". The <xs:sequence> tag means that the elements defined (name, orderid and shipdate) must appear in that order inside a "letter" element. We could also give the complexType element a name, and let the "letter" element have a type attribute that refers to the name of the complexType (if you use this method, several elements can refer to the same complex type): < xs:element name="letter" type="lettertype"/> < xs:complexType name="lettertype" mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:positiveInteger"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> < /xs:complexType>
XML elements can have different content type: element content, mixed content, Simple content or empty content or element can have attribute. Simple elements cannot have attributes. If an element has attributes, it is considered to be of a complex type. But the attribute itself is always declared as a simple type. Element “title” has content type, element “prod” doesn’t have any type but has an attribute. Element “chapter” has content and mixed type. Element “para” has content type. Element “book” and “chapter”ia a parent element and has child elements. The attribute named id has the value "33-657". The attribute named media has the value "paper".
Example Here is an XML element with an attribute: < lastname lang="EN">Smith</lastname> And here is the corresponding attribute definition: < xs:attribute name="lang" type="xs:string"/> Default and Fixed Values for Attributes Attributes may have a default value OR a fixed value specified. A default value is automatically assigned to the attribute when no other value is specified. In the following example the default value is "EN": < xs:attribute name="lang" type="xs:string" default="EN"/> A fixed value is also automatically assigned to the attribute, and you cannot specify another value. In the following example the fixed value is "EN": < xs:attribute name="lang" type="xs:string" fixed="EN"/> Optional and Required Attributes Attributes are optional by default. To specify that the attribute is required, use the "use" attribute: < xs:attribute name="lang" type="xs:string" use="required"/> Restrictions on Content When an XML element or attribute has a data type defined, it puts restrictions on the element's or attribute's content. If an XML element is of type "xs:date" and contains a string like "Hello World", the element will not validate. With XML Schemas, you can also add your own restrictions to your XML elements and attributes. These restrictions are called facets. You can read more about facets in the next chapter.
To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint. The example below defines an element called "car" with a restriction. The only acceptable values are: Audi, Golf, BMW: < xs:element name="car"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> < /xs:simpleType> < /xs:element> The example above could also have been written like this: < xs:element name="car" type="carType"/> < xs:simpleType name="carType"> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> < /xs:simpleType> Note: In this case the type "carType" can be used by other elements because it is not a part of the "car" element. To limit the content of an XML element to define a series of numbers or letters that can be used, we would use the pattern constraint. The example below defines an element called "letter" with a restriction. The only acceptable value is ONE of the LOWERCASE letters from a to z: < xs:element name="letter"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example defines an element called "initials" with a restriction. The only acceptable value is THREE of the UPPERCASE letters from a to z: < xs:element name="initials"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z][A-Z][A-Z]"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example also defines an element called "initials" with a restriction. The only acceptable value is THREE of the LOWERCASE OR UPPERCASE letters from a to z: < xs:element name="initials"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-zA-Z][a-zA-Z][a-zA-Z]"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example defines an element called "choice" with a restriction. The only acceptable value is ONE of the following letters: x, y, OR z: < xs:element name="choice"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[xyz]"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example defines an element called "prodid" with a restriction. The only acceptable value is FIVE digits in a sequence, and each digit must be in a range from 0 to 9: < xs:element name="prodid"> < xs:simpleType> <xs:restriction base="xs:integer"> <xs:pattern value="[0-9][0-9][0-9][0-9][0-9]"/> </xs:restriction> < /xs:simpleType> </xs:element> Other Restrictions on a Series of Values The example below defines an element called "letter" with a restriction. The acceptable value is zero or more occurrences of lowercase letters from a to z: < xs:element name="letter"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="([a-z])*"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example also defines an element called "letter" with a restriction. The acceptable value is one or more pairs of letters, each pair consisting of a lower case letter followed by an upper case letter. For example, "sToP" will be validated by this pattern, but not "Stop" or "STOP" or "stop": < xs:element name="letter"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="([a-z][A-Z])+"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example defines an element called "gender" with a restriction. The only acceptable value is male OR female: < xs:element name="gender"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="male|female"/> </xs:restriction> < /xs:simpleType> </xs:element> The next example defines an element called "password" with a restriction. There must be exactly eight characters in a row and those characters must be lowercase or uppercase letters from a to z, or a number from 0 to 9: < xs:element name="password"> < xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-zA-Z0-9]{8}"/> </xs:restriction> < /xs:simpleType> </xs:element>
The DTD describes a scheme for a particular document markup language using a set of declarations (objects, options, elements and attributes) that describe its class (or type) in terms of syntactic constraints of this document. DTD may also declare constructs that are always needed to determine the structure of the document, but, but may influence the interpretation of certain documents. The attribute named id has the value "33-657". The attribute named media has the value "paper". REQUIRED – attribute should be identified. FIXED - attribute should be fixed. IMPLIED - attribute can be unengaging. There are some default properties: 1.IMPLIED - attribute is optional; 2.REQUIRED - the attribute value must be indicated; 3.FIXED - the value of this attribute is specified as a constant in the DTD and the document can not be changed; 4.nekotoroe specific value, which is used by default.
The attribute named id has the value "33-657". The attribute named media has the value "paper". Declare elements form a list of authorized names of elements in the document and determine the information on the tags (if they are mandatory) and the content model for each element. Different keys words and symbols define the contents of: EMPTY - empty the contents of ANY - any content - Specifies the order | - Separation of alternatives () - Grouping * - Any number of items (zero or more) + - At least one element (one or more) ? - Availability of an optional element (zero or one) If there is no, * +, or? - The item should be only one There are some types of attributes: CDATA (Character set of data) - the attribute value can be any character data ID - the attribute value must be a unique identifier for the element IDREF - the element value is a reference to an element by its ID IDREFS - Same as IDREF, but with the possibility of links is not one identifier, and for several NMTOKEN - an attribute value can be a sequence of characters, in some ways similar to the name (hence the name - name token). This is a string that contains any combination of those characters that are allowed to use names for XML. NMTOKENS - the attribute value is a list of values ENTITY - the value used to refer to an external entity. ENTITIES - allows you to specify a list of external entities, separated by spaces. NOTATION - the attribute value can be one of the previously defined notations NOTATIONS - allows you to specify a list of notations. Listings and NOTATION-listings ENUMERATION - sets a list of possible alternative values. There are some default properties: 1.IMPLIED - attribute is optional; 2.REQUIRED - the attribute value must be indicated; 3.FIXED - the value of this attribute is specified as a constant in the DTD and the document can not be changed; 4.nekotoroe specific value, which is used by default.
What is an XML Schema? The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. An XML Schema: defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes XML Schemas are the Successors of DTDs We think that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons: XML Schemas are extensible to future additions XML Schemas are richer and more powerful than DTDs XML Schemas are written in XML XML Schemas support data types XML Schemas support namespaces XML Schema is a W3C Recommendation XML Schema became a W3C Recommendation 02. May 2001. You can read more about the XML Schema standard in our W3C tutorial . We believe that very soon XML-schema to be used in Web-based applications instead of the DTD-tables, and here's why: XML-schemas easier to learn than DTD XML-schema can be expanded in the future if needed to add some XML-Schema is richer and more useful than DTD XML-schema written in XML XML-Schema data typing support XML-schema namespace support <xs:element name="to" type="xs:string"/> - is a simple element <xs:element name="note"> - is a complex element XML Schema has a lot of built-in data types. The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time Order indicators are used to define the order of the elements. All Indicator The <all> indicator specifies that the child elements can appear in any order, and that each child element must occur only once: < xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:all> </xs:complexType> < /xs:element> Note: When using the <all> indicator you can set the <minOccurs> indicator to 0 or 1 and the <maxOccurs> indicator can only be set to 1 (the <minOccurs> and <maxOccurs> are described later). Choice Indicator The <choice> indicator specifies that either one child element or another can occur: < xs:element name="person"> <xs:complexType> <xs:choice> <xs:element name="employee" type="employee"/> <xs:element name="member" type="member"/> </xs:choice> </xs:complexType> < /xs:element> Sequence Indicator The <sequence> indicator specifies that the child elements must appear in a specific order: < xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> < /xs:element> XML Schemas Support Data Types One of the greatest strength of XML Schemas is the support for data types. With support for data types: It is easier to describe allowable document content It is easier to validate the correctness of data It is easier to work with data from a database It is easier to define data facets (restrictions on data) It is easier to define data patterns (data formats) It is easier to convert data between different data types XML Schemas use XML Syntax Another great strength about XML Schemas is that they are written in XML. Some benefits of that XML Schemas are written in XML: You don't have to learn a new language You can use your XML editor to edit your Schema files You can use your XML parser to parse your Schema files You can manipulate your Schema with the XML DOM You can transform your Schema with XSLT XML Schemas Secure Data Communication When sending data from a sender to a receiver, it is essential that both parts have the same "expectations" about the content. With XML Schemas, the sender can describe the data in a way that the receiver will understand. A date like: "03-11-2004" will, in some countries, be interpreted as 3.November and in other countries as 11.March. However, an XML element with a data type like this: <date type="date">2004-03-11</date> ensures a mutual understanding of the content, because the XML data type "date" requires the format "YYYY-MM-DD". XML Schemas are Extensible XML Schemas are extensible, because they are written in XML. With an extensible Schema definition you can: Reuse your Schema in other Schemas Create your own data types derived from the standard types Reference multiple schemas in the same document Well-Formed is not Enough A well-formed XML document is a document that conforms to the XML syntax rules, like: it must begin with the XML declaration it must have one unique root element start-tags must have matching end-tags elements are case sensitive all elements must be closed all elements must be properly nested all attribute values must be quoted entities must be used for special characters Even if documents are well-formed they can still contain errors, and those errors can have serious consequences. Think of the following situation: you order 5 gross of laser printers, instead of 5 laser printers. With XML Schemas, most of these errors can be caught by your validating software.
XML is Text-based XML is a text-based markup language. One great thing about XML is that XML files can be created and edited using a simple text-editor like Notepad. However, when you start working with XML, you will soon find that it is better to edit XML documents using a professional XML editor. Why Not Notepad? Many web developers use Notepad to edit both HTML and XML documents because Notepad is included with the most common OS and it is simple to use. Personally I often use Notepad for quick editing of simple HTML, CSS, and XML files. But, if you use Notepad for XML editing, you will soon run into problems. Notepad does not know that you are writing XML, so it will not be able to assist you. Why an XML Editor? Today XML is an important technology, and development projects use XML-based technologies like: XML Schema to define XML structures and data types XSLT to transform XML data SOAP to exchange XML data between applications WSDL to describe web services RDF to describe web resources XPath and XQuery to access XML data SMIL to define graphics To be able to write error-free XML documents, you will need an intelligent XML editor! XML Editors Professional XML editors will help you to write error-free XML documents, validate your XML against a DTD or a schema, and force you to stick to a valid XML structure. An XML editor should be able to: Add closing tags to your opening tags automatically Force you to write valid XML Verify your XML against a DTD Verify your XML against a Schema Color code your XML syntax