Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pal gov.tutorial2.session2.xml dtd's


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Pal gov.tutorial2.session2.xml dtd's

  1. 1. ‫أكاديمية الحكومة اإللكترونية الفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.psTutorial II: Data Integration and Open Information Systems Session 2 XML DTD’s Dr. Ismail M. Romi Palestine Polytechnic University PalGov © 2011 1
  2. 2. AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communities, grant agreement 511159-TEMPUS-1-2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.psProject Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, PalestineCoordinator:Dr. Mustafa JarrarBirzeit University, P.O.Box 14- Birzeit, PalestineTelfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  3. 3. © Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and website), and the author of that part.No part of this tutorial may be reproduced or modified in any form or byany means, without prior written permission from the project, who havethe full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SAThis license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creationsunder the identical terms. PalGov © 2011 3
  4. 4. Tutorial Map Topic h Intended Learning Objectives Session 1: XML Basics and Namespaces 3A: Knowledge and Understanding Session 2: XML DTD’s 3 2a1: Describe tree and graph data models. Session 3: XML Schemas 3 2a2: Understand the notation of XML, RDF, RDFS, and OWL. Session 4: Lab-XML Schemas 3 2a3: Demonstrate knowledge about querying techniques for data models as SPARQL and XPath. Session 5: RDF and RDFs 3 2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3 2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3 heterogeneous data. Session 8: Lab-OWL 3B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3 2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3 RDF). Session 11: Lab-Oracle Semantic Technology 3 2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5 2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5 2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1 2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1 and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3D: General and Transferable Skills 2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5 2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5 2d3: Use of creativity and innovation in problem solving. 2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3 PalGov © 2011 4
  5. 5. Session ILO’s:After completing this session students will be able to: •Manage data represented in XML. •Represent data using tree and graph data models. PalGov © 2011 5
  6. 6. Session2: Document Type Definition-DTDSession Overview:</Create DTDs>< Validate an XML document against a DTD /><Use DTDs to create XML documents from multiple files /> PalGov © 2011 6
  7. 7. XML Schemas A quality control tool. Describes the structure of an XML document. Ensures that a document fulfills a minimum set of requirements. Serve as away to formalize an application to be publishable object. XML schema is like a program that tells a processor how to read the document. PalGov © 2011 7
  8. 8. A history of schema Language1. Document Type Definition – DTD: – The oldest and most widely supported schema language.2. The W3C Built XML Schema: – XML Schemas are themselves XML documents.3. RELAX NG4. Schemarton PalGov © 2011 8
  9. 9. Validation StepsA "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition.1. The processor reads the rules and declaration in the schema.2. Build a specific type of parser (validating parser)3. The validating parser take an XML instance as input.4. Produces a validation report. PalGov © 2011 9
  10. 10. Document Type Definition - DTD Defines the legal building blocks of an XML document. Defines the document structure with a list of legal elements and attributes. DTDs are extensible - meaning they can be extended to meet the needs of the current task. A DTD can be specified within an XML document (internal) or in a separate file (external). Many free DTDs exist on the internet today and can be freely downloaded. DTDs declare a set of allowed elements. PalGov © 2011 10
  11. 11. Document Type Definition - DTD DTDs define a content model for each element: This describes what elements or data can go inside an element, in what order, in what number, and whether they are required or optional. DTDs declare a set of allowed attributes for each element with data types and default values. DTDs provide mechanisms to manage the model, providing links to other components. The Document Type Declaration  Internal DTD declaration:  The DTD declared inside the XML file.  External DTD declaration:  The DTD declared in an external file. PalGov © 2011 11
  12. 12. Internal DTD Declaration<!DOCTYPE root-element [element-declaration ]>Example:<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]><note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Dont forget me this weekend!</body> </note> PalGov © 2011 12
  13. 13. External DTD Declarations You can refer to an external DTD in one of the following two ways: – System identifiers – Public identifiers PalGov © 2011 13
  14. 14. External DTD Declarations using System Identifiers<!DOCTYPE root-element SYSTEM “system identifier” [...]> System identifier is a file reference, consists of: – The keyword SYSTEM – URI reference pointing to the document‘s location.• A URI can be a file on your local hard drive, a file on your intranet or network, or even a file available on the Internet:Examples:<!DOCTYPE name SYSTEM ―/user/local/dtds/name.dtd‖ [ ]><!DOCTYPE name SYSTEM ―‖ [ ]><!DOCTYPE name SYSTEM ―name.dtd‖> PalGov © 2011 14
  15. 15. External DTD Declarations using Public Identifiers<!DOCTYPE root-element PUBLIC “public identifier” [...]> Public identifiers are used to identify an entry in a catalog. A commonly used format is called Formal Public Identifiers (FPIs). The syntax for an FPI is defined in the document ISO9070. FPI Syntax:“-//Owner//Class Description//Language//Version” Example: <!DOCTYPE name PUBLIC ―-//Beginning XML//DTD Name Example//EN‖> Recommended list of DOCTYPE at: PalGov © 2011 15
  16. 16. Sharing Vocabularies It is often better to share vocabularies and use DTDs that are widely accepted. Sharing DTDs enables you to more easily integrate with other companies and XML developers who use the shared vocabularies. Many individuals and industries have developed DTDs. Examples: – Chemical Markup Language (CML) DTD – XHTML, maintains three DTDs (Transitional, Strict, and Frameset). You can check many places when trying to find a DTD for a specific industry. – – PalGov © 2011 16
  17. 17. Anatomy of a DTD DTDs consist of three basic parts: 1. Element declarations 2. Attribute declarations 3. Entity declarations Those declarations must follow DOCTYPE declaration as follow: <?xml version 1.0, standalone = “yes”> <!DOCTYPE root-element [ declarations declarations ]> PalGov © 2011 17
  18. 18. Element Declarations ELEMENT declaration is used to indicate to the parser that you are about to define an element. The declaration can appear only within the context of the DTD. Syntax<!ELEMENT element-name (content model)> Element declarations consist of three basic parts: – ELEMENT Key word (<!ELEMENT) – Element name – Element content model PalGov © 2011 18
  19. 19. Element Declarations…Cont An element‘s content model defines the allowable content within the element. An element may contain element children, text, a combination of children and text, or the element may be empty. Four kinds of content models exist: – Element content – Mixed content – Empty content – Any content PalGov © 2011 19
  20. 20. Element Content Include the allowable elements within parentheses. Example:<!ELEMENT contact (name, location, phone)> Each element that you specify within this element‘s content model must also have its own definition within the DTD. PalGov © 2011 20
  21. 21. Element Content…Cont The processor needs this information so that it knows how to handle each element when it is encountered. Name in the content model must appear exactly as it will in the document. Ways of specifying the element children: – Sequences – Choices PalGov © 2011 21
  22. 22. Element Content - Sequences The elements within these documents must appear in a distinct order. If your XML document were missing one of the elements within the sequence, or if your document contained more elements, the parser would raise an error. If all of the specified elements were included within the XML document but appeared in another order processor would raise an error. whitespace doesn‘t matter. PalGov © 2011 22
  23. 23. Element Content - Choices Sometimes you needed to allow one element or another, but not both. You would need a choice mechanism of some sort. Example: <!ELEMENT location (address | GPS)> This declaration would allow the <location> element to contain one <address> or one <GPS> element. If the <location> element were empty, or if it contained more than one of these elements, the parser would raise an error. PalGov © 2011 23
  24. 24. Mixed Content The XML Recommendation specifies that any element with text in its content is a mixed content model element. Within mixed content models, text can appear by itself or it can be interspersed between elements. The simplest mixed content model—text only: <!ELEMENT element-name (#PCDATA)> #PCDATA keyword, (Parsed Character DATA): – indicates that the character data within the content model should be parsed by the parser. – Used for text or character data. PalGov © 2011 24
  25. 25. Mixed Content - Cont Every time you declare elements within a mixed content model, they must follow four rules: – They must use the choice mechanism (the vertical bar | character) to separate elements. – The #PCDATA keyword must appear first in the list of elements. – There must be no inner content models. – If there are child elements, the * cardinality indicator must appear at the end of the model. PalGov © 2011 25
  26. 26. Mixed Content-ExampleDTD:<!ELEMENT description (#PCDATA | em | strong | br)*>XML Document:<description>Jeff is a developer and author for Beginning XML <em>4thedition</em>.<br/>Jeff <strong>loves</strong> XML!</description> The text may appear every where, and the em, strong, br can appear any time. Note: em: italic, strong:bold, br: line break PalGov © 2011 26
  27. 27. Empty Content Empty element doesn‘t have content.<!ELEMENT element-name EMPTY> The most common used empty element is:<br/> (line break). PalGov © 2011 27
  28. 28. Element with ANY content <!ELEMENT element-name ANY> Can contain any combination of parsable data (text, or elements). ANY: a keyword indicates that any elements declared within the DTD can be used within the content of the element and that they can be used in any order any number of times. PalGov © 2011 28
  29. 29. Cardinality An element‘s cardinality defines how many times it will appear within a content model. Each element within a content model can have an indicator following the element name that tells the parser how many times it will appear. PalGov © 2011 29
  30. 30. Cardinality…ContIndicator Description None when no cardinality indicator is used, it indicates that the element must appear once and only once. ? Indicates that the element may appear either once or not at all + Indicates that the element may appear one or more times * Indicates that the element may appear zero or more times Example: <!ELEMENT name (first+, middle?, last), Tel*> PalGov © 2011 30
  31. 31. Attribute Declarations<!ATTLIST element-name attribute-name attribute-type ―attribute-value‖ DTD example: <!ATTLIST payment type CDATA ―check‖> XML example: <payment type=―check‖> PalGov © 2011 31
  32. 32. Attribute Types Type DescriptionCDATA Indicates that the attribute value is character data (unparsed).ID Indicates that the attribute value uniquely identifies the containing element.IDREF The value is the id of another element.IDREFS The value is a list of other idsENTITY The value is an entityENTITIES The value is a list of entitiesNMTOKEN The value is a valid XML nameNMTOKENS The value is a list of valid XML namesEnumerated List The value must be an enumerated value (val1 | val2 | ….) <!ATTLIST element-name attribute-name attribute-type “attribute-value” PalGov © 2011 32
  33. 33. CDATA• It specifies that the attribute value is character data (any text).• Unparsed contentDTD example:<!ELEMENT square EMPTY><!ATTLIST square width CDATA "0">XML example:<square width="100"></square> <!ATTLIST element-name attribute-name attribute-type “attribute-value” PalGov © 2011 33
  34. 34. ID, IDREF, and IDREFS Attributes of type ID can be used to uniquely identify an element within an XML document. Once you have uniquely identified the element, you can later use an IDREF to refer to that element. Remember several rules when using ID attributes: – The value of an ID attribute must be unique within the entire XML document. – Only one attribute of type ID may be declared per element. – The attribute value declaration for an ID attribute must be #IMPLIED or #REQUIRED. The value of an IDREF attribute must match the value of some ID within the XML document. To refer to a list of elements: – Use an IDREFS attribute store with a list of whitespace-separated IDREF values that refer to an ID attributes defined in the document. PalGov © 2011 34
  35. 35. ENTITY and ENTITIES• Attributes can also include references to unparsed entities.• An unparsed entity is an entity reference to an external file that the processor cannot parse (external images..).• Instead of actually including the image inside the document, you use special attributes to refer to the external resource. <!ATTLIST element-name attribute-name attribute-type “attribute-value” PalGov © 2011 35
  36. 36. Enumerated Attribute Types • Used to restrict attribute values • An enumerated list allows you to specify a list of allowable values. • Each value must be a valid XML name • Example: DTD: <!ATTLIST phone kind (Home | Work | Cell | Fax) #IMPLIED> XML: <phone kind=―Cell‖ > Valid <phone kind=―cell‖ > Invalid<!ATTLIST element-name attribute-name attribute-type “attribute-value” PalGov © 2011 36
  37. 37. Attribute Value Declarations Within each attribute declaration you must specify how the value will appear in the document. The XML Recommendation allows you to specify that the attribute: Value Description #DEFAULT The attribute has a default value #REQUIRED The attribute value must be included in the element #IMPLIED The attribute does not have to be included #FIXED The attribute value is fixed<!ATTLIST element-name attribute-name attribute-type “attribute-value” PalGov © 2011 37
  38. 38. Specifying Multiple Attributes Declaring each attribute: <!ATTLIST contacts version CDATA #FIXED ―1.0‖> <!ATTLIST contacts source CDATA #IMPLIED> Using one declaration: <!ATTLIST contacts version CDATA #FIXED ―1.0‖ source CDATA #IMPLIED> PalGov © 2011 38
  39. 39. Entities• Place holder in XML• Types: – Built-in entities – Character entities – General entities – Parameter entities PalGov © 2011 39
  40. 40. Built-in Entities• &amp; The & character• &lt; The < character• &gt; The > character• &apos; The ‗ character• &quot; The ― character PalGov © 2011 40
  41. 41. References to Built-in Entities To use an entity, you must include an entity reference within the document. An entity reference refers to an entity that represents a character, some text, or even an external file. A reference to a built-in entity takes the following form: &entity-name; Example: <CheckAvg> Avg &lt; ―85‖ </CheckAvg> PalGov © 2011 41
  42. 42. Character Entities• Used for characters that are difficult to type.• Not found on the keyboard. &#unicode-value;• Example: © === character c• Using Hexadecimal values:• Example: you must include a lowercase x © === character c before the value, so that the XML parser knows how it should handle the reference. PalGov © 2011 42
  43. 43. General Entities ( Internal Entities) Variables used to define shortcuts to standard text or special characters. General entities must be declared within the DTD before they can be used within the XML document. Declaration: – <!ENTITY entity-name ―value‖> Example:DTD – <!ENTITY address ―Palestine, Hebron, POBox 198‖>XML– <ppu-address> &address; </ppu-address> PalGov © 2011 43
  44. 44. External Entities• Entity whose replacement text exists in another file.• Useful for: – Importing content that is shared by many documents. – Importing content that is changed frequently. – Breaking the document into multiple physical parts.• External entities must be declared in order to enable the parser find the replacement text. PalGov © 2011 44
  45. 45. External Entities…Cont• Declaration: – <!ENTITY entity-name SYSTEM ―Physical location‖>• Example: – <!ENTITY countries System ―d://countries.xml‖> PalGov © 2011 45
  46. 46. Unparsed Entities• Holds content that should not be parsed because it contains something other than text or xml.• Useful for: – Importing graphics, sound files. – None character data.• Declaration:<!ENTITY entity-name SYSTEM ―physical location‖ NDATA file-format> PalGov © 2011 46
  47. 47. Unparsed Entities…Cont• Example:DTD<!ENTITY pic1 SYSTEM ―c://pic.git‖ NDATA GIF>XML<picture> &pic1; </picture> PalGov © 2011 47
  48. 48. DTD Limitations• Differences between DTD syntax and XML syntax.• Poor support for XML namespaces• Poor data typing.• Limited content model descriptions. PalGov © 2011 48
  49. 49. Summary• By using DTDs, you can easily validate your XML documents against a defined vocabulary of elements and attributes. This reduces the amount of code needed within your application.• An XML parser can be used to check whether the contents of an XML document are valid according to the declarations within a DTD. PalGov © 2011 49
  50. 50. Refrences• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt, A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing Inc: Indiana, USA.• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.• Amiano, M., DCruz, C., Ethier, K., Thomas, M., (2006), XML: Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.•••• PalGov © 2011 50
  51. 51. <e-Gov> Thank you </e-Gov> PalGov © 2011 51