Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this


  1. 1. XML OVERVIEW e-logistics 2009 Eduard Rodés Gubern Port de Barcelona
  2. 2. What is XML? <ul><li>EXtensible Markup Language (XML) is a way to apply structure to a web page. XML provides a standard open format and mechanisms for structuring a document so that it can be exchanged and manipulated. </li></ul>
  3. 3. XML History <ul><li>The concept of XML is over 30 years old, beginning in the 1960’s. Its origins are in the standardized typesetting codes GENCODE used by the publishing industry. </li></ul><ul><li>In the 1970’s, Dr. C. F. Goldfarb proposed a method of describing text that was not specific to an application or hardware. He created Generalized Markup Language (GML). The basic tenents of GML were: </li></ul><ul><ul><li>Markup should emphasize the document structure not format or style </li></ul></ul><ul><ul><li>Simple input syntax for markup using <> and </> tags </li></ul></ul><ul><ul><li>Markup syntax rules should be strictly controlled so that the code could be easily read by humans or software programs. </li></ul></ul><ul><li>Originally the number of document types supported by GML was limited so the addition of any new tags and document types was relatively simple. By the 1980’s, however, these numbers grew to such an extent that GENCODE and GML proponents formed the ANSI Committee on Computer Languages for the Processing of Text. </li></ul>
  4. 4. XML History <ul><li>In 1986 this committee promulgated Standardized Generalized Markup Language (SGML) which standardized the use use of <> and </> tags, as well as Document Type Definitions (DTD). As with GENCODE and GML, the primary use of SGML was for large-scale publishing. </li></ul><ul><li>As interest in the Internet grew, and the functionality of Internet browsers evolved, the need for a standardized hypertext application increased. In the early 1990’s the World Wide Web Consortium (W3C) adopted HyperText Markup Language (HTML) as the standard. HTML is a subset of SGML because it borrowed existing tags from SGML and DTD. </li></ul><ul><li>As web communities have grown, so has the need to publish new types of documents. Many of these documents are community specific. Unfortunately, HTML cannot be extended to accommodate new document types, and browsers will not support SGML. These needs prompted the W3C to sponsor the development of an “eXtensible Markup Language.” </li></ul>
  5. 5. XML Design Goals <ul><li>The design goals for XML were proposed by the World Wide Web Consortium (W3C) and published in January 1998. A synopsis of these design goals is as follows: </li></ul><ul><ul><li>XML shall be straightforwardly usable over the Internet </li></ul></ul><ul><ul><li>XML shall support a wide variety of applications </li></ul></ul><ul><ul><li>XML shall be compatible with SGML </li></ul></ul><ul><ul><li>It shall be easy to write programs which process XML documents </li></ul></ul><ul><ul><li>The number of optional features in XML is to be kept to the absolute minimum, ideally zero </li></ul></ul><ul><ul><li>XML documents should be human-legible and reasonably clear </li></ul></ul><ul><ul><li>The XML design should be prepared quickly </li></ul></ul><ul><ul><li>The design of XML shall be formal and concise </li></ul></ul><ul><ul><li>XML documents should be easy to create </li></ul></ul><ul><ul><li>Terseness in XML markup is of minimal importance </li></ul></ul>
  6. 6. The Basics A markup language is the set of rules. It declares what constitutes markup in a document, and defines exactly what the markup means. It also provides a description of document layout and logical structure. There exist three types of markup: • Stylistic : how a document is presented (e.g., the HTML tags <I> for italics, <B> for bold, and <U> for underline) • Structural : how the document is to be structured (e.g., the HTML tags <P> for paragraph, <SPAN> for creating ad hoc styles in a document, and <DIV> for grouping structures aligned in the same way. • Semantic : tells about the content of the data (e.g., the HTML tags <TITLE> for page title, <HEAD> for page header information, and <SCRIPT>to indicate a JavaScript in a page.) In XML the only type of markup that we are concerned with is structural.
  7. 7. Prolog & Document Type Definition <ul><li>XML documents should begin with an XML Declaration which specifies version </li></ul><ul><ul><li>Optionally may also include: </li></ul></ul><ul><ul><ul><li>Encoding (recommended) </li></ul></ul></ul><ul><ul><ul><li>Stand-alone declaration </li></ul></ul></ul><ul><li>Document Type Definition is typically next </li></ul><ul><li><?xml version=&quot;1.0&quot; encoding='UTF-8' standalone='no' ?> </li></ul><ul><li><!DOCTYPE root SYSTEM &quot;myDocs.dtd&quot; > </li></ul>
  8. 8. Tags <ul><li>Tags carry the smallest unit of meaning signifying structure, format or style of the data. They are always enclosed within angled brackets ‘< >’. </li></ul><ul><li>Tags are case-sensitive. This means that the tags <friend>, <Friend>, <FRIEND> carry different meanings and cannot be used interchangeably. </li></ul><ul><li>All tags must be paired so that they have a start <friend> and an end </friend>. Tags combined with data form elements. </li></ul>
  9. 9. Tags <ul><li>There are some basic rules to naming XML tags: </li></ul><ul><ul><li>XML is case sensitive </li></ul></ul><ul><ul><li>Element names may start with any letter or an underscore (_) </li></ul></ul><ul><li>After the first character, element names may contain: </li></ul><ul><ul><li>Letters </li></ul></ul><ul><ul><li>Numbers </li></ul></ul><ul><ul><li>periods (.) </li></ul></ul><ul><ul><li>hyphens (-) </li></ul></ul><ul><ul><li>underscores (_) </li></ul></ul><ul><ul><li>colons. (:) </li></ul></ul><ul><li>Element names may not contain white spaces. </li></ul><ul><li>Element names may not start with &quot;XML&quot; or any case variations of these letters. These are reserved by the World Wide Web Consortium (W3C). </li></ul>
  10. 10. Elements <ul><li>Elements are the building blocks of a document. An element consists of a start-tag, an end-tag and the content between them </li></ul><ul><li><friend>El Soussy</friend> </li></ul><ul><li>Within this single element there may be multiple levels of nested sub-elements which keep the individual pieces of data in a logical and easy to manage structure: </li></ul><ul><li><?xml version=”1.0”?> </li></ul><ul><li><friend> </li></ul><ul><li><name>El Soussy</name> </li></ul><ul><li><address> </li></ul><ul><li><street>Palestinian Gardens</street> </li></ul><ul><li><city>Alexandria</city> </li></ul><ul><li><country>EG</country> </li></ul><ul><li><zip>90210</zip> </li></ul><ul><li></address> </li></ul><ul><li></friend> </li></ul><ul><li>This is a well formed document XML </li></ul>
  11. 11. Attributes <ul><li>Attributes are used to describe the element </li></ul><ul><ul><li>If elements are akin to nouns, think of attributes as adjectives modifying the noun. </li></ul></ul><ul><ul><li>Can be used to embellish content… </li></ul></ul><ul><ul><li>or to associate added content to an element </li></ul></ul><ul><ul><li>Attributes are written in an element's start tag with the name of the attribute, followed by an equal sign (=) and a value given to that attribute </li></ul></ul><ul><li>An HTML example would be <hr width=&quot;50%&quot;>. This tag tells the browser to put a horizontal rule on the page </li></ul><ul><li>Attribute naming rules are the same as those for element names. In addition, a tag may not have two attributes with the same name. Attributes beginning XML are reserved </li></ul><ul><li>All attribute values must be quoted. </li></ul><ul><ul><ul><ul><ul><li>EXAMPLE of an element with an attribute </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li><book call_no=&quot;PZ3.S8195Gr6&quot;> </li></ul></ul></ul></ul></ul>
  12. 12. Attributes <ul><li>A question often arises as whether to make something a child element or an attribute </li></ul><ul><li>There's no rule that says you have to do it one way or another </li></ul><ul><li>A good way that helps to decide, is to know the functions of each </li></ul><ul><li>Element contents, generally speaking, are meant to be displayed data that is parsed on the screen </li></ul><ul><li>Think of attributes as data about the data; that is, it's information that is more important to the parser than to the reader of the data, so it's not rendered on the screen </li></ul>
  13. 13. Attributes <ul><li>Declaration </li></ul><ul><ul><li>As with elements, each attribute must be defined </li></ul></ul><ul><ul><li>An element's attribute list must be defined outside the element declaration; each with its own declaration : </li></ul></ul><ul><li><!ATTLIST   elementName   attributeName   type   default > </li></ul><ul><ul><li>elementName is the element containing the attribute, and attributeName is the name of the attribute </li></ul></ul><ul><ul><li>An attribute defined to be a CDATA type simply contains character data. This is similar to an element's PCDATA except with attributes the value is not parsed </li></ul></ul><ul><ul><li>The other type is actually a list of possible values that may be used with an attribute. For example, the HTML <hr> tag and its align attribute which may contain only a left, right, or center value. If we were to write an attribute declaration for this tag, its type would be listed as (left|right|center) </li></ul></ul>
  14. 14. Attributes <ul><li>The attribute declaration's default is either: </li></ul><ul><ul><li>the default value: if this attribute isn't explicitly specified when the element is used or </li></ul></ul><ul><ul><li>it is a default value keyword: a default value keyword indicates the usage of the attribute. </li></ul></ul><ul><li>Generally, you use a keyword when you don't have a specific value to set as a default </li></ul><ul><ul><li>There are three possible keywords: </li></ul></ul><ul><li>Keyword Explanation </li></ul><ul><li>#REQUIRED The attribute must be used in the element. </li></ul><ul><li>#IMPLIED The attribute is not required. </li></ul><ul><li>#FIXED &quot;value“ Whether or not the attribute is explicitly used, this element will have the fixed value as its default and this value cannot be changed. </li></ul>
  15. 15. Character entities <ul><li>Whenever the XML parser encounters certain characters like the < and > symbols, it interprets them as instructions. </li></ul><ul><li>To use these symbols in your content text, you have to use their entity references </li></ul><ul><li>In XML, only five character entities have been predefined: </li></ul><ul><ul><li>&gt; > greater than </li></ul></ul><ul><ul><li>&lt; < less than </li></ul></ul><ul><ul><li>&amp; & ampersand </li></ul></ul><ul><ul><li>&apos; ' apostrophe </li></ul></ul><ul><ul><li>&quot; &quot; double quote </li></ul></ul>
  16. 16. Document Type Definitions <ul><li>In addition to well-formed documents, there are ‘valid’ XML documents. This means the documents follow a more formal structure. The main difference between well-formed XML and valid XML is the Document Type Definition (DTD). The DTD is a set of rules that define the elements that may be used, and where they may be applied in relation to each other. </li></ul><ul><li>To indicate that an element's contents contain other elements, simply list those child elements in the order they should appear. There are 2 symbols that can be used to separate the listed child elements: </li></ul><ul><ul><li>, (comma) Each subsequent element follows the preceding element </li></ul></ul><ul><ul><li>| (pipe symbol) One or the other element may be used </li></ul></ul>
  17. 17. Document Type Definitions <ul><li>Every element, and how that element is used in the tag set has to be declared </li></ul><ul><li>How they may be used must be declared as well </li></ul><ul><li>This is the formula for defining an element: </li></ul><ul><ul><li><!ELEMENT   elementName   elementContents > </li></ul></ul><ul><li>Each element to be used in a valid XML document must be declared in the DTD. If it's not in there, it can't be used </li></ul><ul><li>In this element declaration, elementName is the name of the element, and elementContents indicates what contents the element may contain: </li></ul><ul><ul><li>(other elements)Elements that can be nested are listed within parentheses. </li></ul></ul><ul><ul><li>ANY Indicates this element may contain any combination of elements or data. </li></ul></ul><ul><ul><li>EMPTY Indicates this element contains no data or elements. </li></ul></ul><ul><ul><li>(#PCDATA)Indicates this element contains parsed character data. </li></ul></ul>
  18. 18. Document Type Definitions <ul><li>There are also ways to indicate the number of times an element may appear in a document. </li></ul><ul><li>Place the frequency indicator after the element name listed in the elementContents area:  </li></ul><ul><ul><li>(no indicator)    Element must appear once and only once. </li></ul></ul><ul><ul><li>? (question mark) Element may or may not appear </li></ul></ul><ul><ul><li>+ (plus sign) Element may appear one or more times </li></ul></ul><ul><ul><li>* (asterisk) Element may appear any number of times or not at all </li></ul></ul>
  19. 19. Document Type Definitions <ul><li>The DTD can be either an external DTD or an internal DTD or both. </li></ul><ul><li>The external DTD exists outside the content of a document and carries the extension .DTD. This type of DTD could be created for use by a particular community, providing a standardized document format for all members. The DTD reference, added at the beginning of the XML file, tells the XML processor where to find the external DTD, information about its creator, the purpose of the DTD,and the language used </li></ul>
  20. 20. Document Type Definitions <ul><li>The internal DTD is written directly in the XML document: </li></ul><ul><li><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> </li></ul><ul><li><!DOCTYPE friend [ </li></ul><ul><li><!ELEMENT friend (name, address+)> </li></ul><ul><li><!ELEMENT name (#PCDATA)> </li></ul><ul><li><!ELEMENT address (street, city, country, zip)> </li></ul><ul><li><!ELEMENT street (#PCDATA)> </li></ul><ul><li><!ELEMENT city (#PCDATA)> </li></ul><ul><li><!ELEMENT country (#PCDATA)> </li></ul><ul><li><!ELEMENT zip (#PCDATA)> </li></ul><ul><li>]> </li></ul><ul><li><friend> </li></ul><ul><li><name>El Soussy</name> </li></ul><ul><li><address> </li></ul><ul><li><street>Palestinian Gardens</street> </li></ul><ul><li><city>Alexandria</city> </li></ul><ul><li><country>EG</country> </li></ul><ul><li><zip>90210</zip> </li></ul><ul><li></address> </li></ul><ul><li></friend> </li></ul>
  21. 21. Document Type Definitions <ul><li><!ELEMENT friend (name, address+)> </li></ul><ul><li><!ELEMENT name (#PCDATA)> </li></ul><ul><li><!ELEMENT address (street, city, country, zip)> </li></ul><ul><li><!ELEMENT street (#PCDATA)> </li></ul><ul><li><!ELEMENT city (#PCDATA)> </li></ul><ul><li><!ELEMENT country (#PCDATA)> </li></ul><ul><li><!ELEMENT zip (#PCDATA)> </li></ul><ul><li><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> </li></ul><ul><li><!DOCTYPE friend SYSTEM &quot;;> </li></ul><ul><li><friend> </li></ul><ul><li><name>El Soussy</name> </li></ul><ul><li><address> </li></ul><ul><li><street>Palestinian Gardens</street> </li></ul><ul><li><city>Alexandria</city> </li></ul><ul><li><country>EG</country> </li></ul><ul><li><zip>90210</zip> </li></ul><ul><li></address> </li></ul><ul><li></friend> </li></ul><ul><li><!DOCTYPE friend PUBLIC “-//friends//DTD Standard /EN” “”> </li></ul><ul><li>The system identifier &quot; &quot; </li></ul><ul><li>gives the address (a URI reference) of a DTD for the document. </li></ul>
  22. 22. XML Style Sheets <ul><li>In order to format and view an XML document, you must combine the document with a style sheet. The document can then be viewed in the appropriate browser. </li></ul><ul><li>Style sheets contain the rules that declare how the data of an XML document should appear or be interpreted by the user agent (browser, printer, text-to-speech converter, etc.) This is done by assigning a style to a tag. The style is then applied to the data contained within the tag. </li></ul><ul><li>Style sheets can be written in several languages. Two of these are: </li></ul><ul><ul><li>Cascading Style Sheets (CSS), an extension of HTML </li></ul></ul><ul><ul><li>Extensible Stylesheet Language (XSL), an XML specific styling language </li></ul></ul>
  23. 23. XSL <ul><li><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> </li></ul><ul><li><xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;;> </li></ul><ul><li><xsl:template match=&quot;/&quot;> </li></ul><ul><li><html> </li></ul><ul><li><head> </li></ul><ul><li><title>Friend</title> </li></ul><ul><li></head> </li></ul><ul><li><body bgcolor=&quot;#ffffff&quot;> </li></ul><ul><li><h1 align=&quot;center&quot;>Alex 2009</h1> </li></ul><ul><li><xsl:for-each select=&quot;friend&quot;> </li></ul><ul><li><h2> </li></ul><ul><li><xsl:value-of select=&quot;name&quot;/> </li></ul><ul><li></h2> </li></ul><ul><li><p> </li></ul><ul><li><xsl:value-of select=&quot;address/street&quot;/> </li></ul><ul><li><br/> </li></ul><ul><li><xsl:value-of select=&quot;address/city&quot;/>,<xsl:value-of select=&quot;address/country&quot;/> </li></ul><ul><li><xsl:value-of select=&quot;address/zip&quot;/> </li></ul><ul><li><hr/> </li></ul><ul><li></p> </li></ul><ul><li></xsl:for-each> </li></ul><ul><li></body> </li></ul><ul><li></html> </li></ul><ul><li></xsl:template> </li></ul><ul><li></xsl:stylesheet> </li></ul><?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!DOCTYPE friend SYSTEM &quot;;> <?xml-stylesheet href=&quot;friend.xsl&quot; type=&quot;text/xsl&quot;?> <friend> <name>El Soussy</name> <address> <street>Palestinian Gardens</street> <city>Alexandria</city> <country>EG</country> <zip>90210</zip> </address> </friend>
  24. 24. XSL
  25. 25. Namespaces <ul><li>A namespace is a collection of names that can be used in XML documents as element or attribute names. They identify the name as being from a particular domain (standards group, company, industry,etc.) </li></ul><ul><li>Namespaces are identified in XML by a Uniform Resource Identifier (URI). The URI includes both a Uniform Resource Name (URN) and a Uniform Resource Locator (URL). URL’s have become very common in the Internet world. The URN is a universally unique number or name that identifies something in a universally unique way. While not as common as URL’s, they will be used more as XML is adopted and used. </li></ul>
  26. 26. Namespaces <ul><li>Namespaces help standardize and uniquely brand elements and attributes. Namespaces employ the URI to instruct the user-agent (browser, XML parser, XML application, etc.) where to go to find the DTD against which the XML document is checked for validity. </li></ul><ul><li>The namespace syntax may also use the reserved attribute ‘xmlns’. In that case the complete syntax looks like this: </li></ul><ul><ul><li>xmlns:[prefix]=”[URI of namespace]” </li></ul></ul><ul><li>The prefix can be any characters allowed in an XML tag, except it may not start with xml. Here is an example using the xmlns syntax: </li></ul><ul><ul><li><xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl =&quot;; > </li></ul></ul><ul><li>In the XML document the following statements occur: </li></ul><ul><ul><li>< xsl:value-of select =&quot;address/street&quot;/> </li></ul></ul><ul><li>When the document is processed it tells the parser: </li></ul><ul><ul><li>for the element < xsl:value-of select > use the element tag from the </li></ul></ul><ul><ul><li>for the element < value-of select > use the (possibly) undeclared < value-of select > element tag </li></ul></ul>