Pal gov.tutorial2.session1.xml basics and namespaces


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pal gov.tutorial2.session1.xml basics and namespaces

  1. 1. ‫أكاديمية الحكومة اإللكترونية الفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.psTutorial II: Data Integration and Open Information Systems Session1 XML Basics and Namespaces Dr. Ismail M. Romi Palestine Polytechnic University PalGov © 2011 1
  2. 2. AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communities, grant agreement 511159-TEMPUS-1-2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.psProject Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, PalestineCoordinator:Dr. Mustafa JarrarBirzeit University, P.O.Box 14- Birzeit, PalestineTelfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  3. 3. © Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and website), and the author of that part.No part of this tutorial may be reproduced or modified in any form or byany means, without prior written permission from the project, who havethe full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SAThis license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creationsunder the identical terms. PalGov © 2011 3
  4. 4. Tutorial Map Topic h Intended Learning Objectives Session 1: XML Basics and Namespaces 3A: Knowledge and Understanding Session 2: XML DTD‘s 3 2a1: Describe tree and graph data models. Session 3: XML Schemas 3 2a2: Understand the notation of XML, RDF, RDFS, and OWL. Session 4: Lab-XML Schemas 3 2a3: Demonstrate knowledge about querying techniques for data models as SPARQL and XPath. Session 5: RDF and RDFs 3 2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3 2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3 heterogeneous data. Session 8: Lab-OWL 3B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3 2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3 RDF). Session 11: Lab-Oracle Semantic Technology 3 2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5 2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5 2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1 2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1 and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3D: General and Transferable Skills 2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5 2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5 2d3: Use of creativity and innovation in problem solving. 2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3 PalGov © 2011 4
  5. 5. Session ILO’s:After completing this session students will be able to: •Describe tree and graph data models. •Understand the notation of XML. PalGov © 2011 5
  6. 6. Session1: XML Basics and NamespacesSession Overview:< Markup language />< What is XML? />< Components of XML Document/>< Why we need namespaces />< The syntax for using namespaces />< What is a URI, a URL, and a URN /> PalGov © 2011 6
  7. 7. Markup• Information added to the document that enhances its meaning.• It identifies the parts and how they relate to each other. PalGov © 2011 7
  8. 8. Markup language A modern system for annotating a text in a way that is syntactically distinguishable from that text. set of words and symbols for describing the identity of pieces of a document (for example ‗this is a paragraph‘, ‗this is a heading‘, ‗this is a list‘, ‗this is the caption of this figure‘, etc). Programs can use this with a style sheet to create output for screen, print, audio, video, Braille, etc. Some markup languages (eg those used in word processors) only describe appearances (‗this is italics‘, ‗this is bold‘), but this method can only be used for display, and is not normally re-usable for anything else. PalGov © 2011 8
  9. 9. History of Markup Efforts starts in 1960‘s TROFF, TEX: Presentation and formatting printed documents. GenCod: (General Coding): Uses descriptive generic tags to assemble documents from multiple pieces. GML: (IBM)-Generalized Markup Language: Encoding documents for use with multiple information subsystems. Document can be edited, formatted, searched by different programs. PalGov © 2011 9
  10. 10. History of Markup…Cont SGML Generalized Markup Language.  A framework for developing specialized markup language.  Encode general purpose documents (books, journals….)  Flexible, all-encompassing coding scheme.  Used for very large documentation projects.  Its usefulness limited to large organizations (high requirements).  Companies develop their own SGML‘s, this means that not compatible to browsers (ms-Explorer, Netscape…) PalGov © 2011 10
  11. 11. History of Markup…Cont - HTML: Hypertext markup language  Developed I mid 1990‘s  Simple  Generic code principles  Specific tags (commands).  Tags are presentational and limited  Open standard (free not tied to any technology).  Limited in it‘s scope and can‘t be extended. PalGov © 2011 11
  12. 12. History of Markup…Cont XML: Extensible markup language  Combines the flexibility of SGML and the simplicity of HTML  The W3C released the official XML version 1.0 specifications in 1998.  XML quickly gained popularity in the web community.  XML itself is not a language, but rather a set of rules that can be used to create markup languages. PalGov © 2011 12
  13. 13. What is XML?• A protocol for containing and managing information.  XML is really all about creating your own markup.  Technically, XML is a meta-language, which means its a language that lets you create your own markup languages.  Unlike HTML, XML is meant for storing data, not displaying it.  XML provides you with a way of containing, shaping, structuring, and protecting data in documents.  XML is a general purpose information storage system.  XML documents are portable because they can be interpreted by many different applications. PalGov © 2011 13
  14. 14. Why “Extensible?’’Because Anyone is free to mark up data in any way using the language, even if others are doing it in different ways.  We have full control over the creation of our XML document.  Data can be shaped in any preferred way: • You can create data in a way that only one particular computer program will ever use, we can do so. • You can share your data with other programs, or even other companies across the Internet, XML gives flexibility to do that as well.  You are free to structure the same data in different ways that suit the requirements of an application or category of applications. PalGov © 2011 14
  15. 15. Functions of XML1. Store and retrieve data2. Formatting documents: • Putting data in a presentable form.3. Ensure data integrity: • Guarantee a minimal level of trust in data (hasn‘t been corrupted, truncated, mistyped, incomplete, broken….).4. Support multiple languages: • Support the character set (Unicode) which supports hundreds of scripts (Latin, Arabic…). PalGov © 2011 15
  16. 16. How I Get Started? Initial Requirements1. Text Editor:  XML editor: Enables in composing and reading the document, and prevent mistakes.  You can use (notepad) or any other editor that support the character set used by the document.2. XML Parser  A software program (XML processor) is required to process an XML document (eg. Stylus).3. Document Type Definition DTD, or Schema.4. Viewing the Document :  View the document in technologies such as browsers or XML environment (eg. Stylus). PalGov © 2011 16
  17. 17. Where XML Can Be Used• Reducing Server Load: • keeping all information on the client for as long as possible, and then sending the information to those servers in one big XML document.• Website Content: • Transforming the same XML document to many formats.• Combining many formats to one XML file…• Distributed Computing: • XML can be used as a means for sending data for distributed computing, where objects on one computer call objects on another computer to do work.• e-Commerce: • XML is the perfect format for the exchanging data between computer processes and applications. • Computer to computer data transfer. PalGov © 2011 17
  18. 18. Components of XML Document• XML Declaration• Elements• Attributes• Entities• Comments PalGov © 2011 18
  19. 19. Tag• Construct that begins with < and ends with >• Start tag <name>• End tag </name>• Tags constitute the markup of the document. PalGov © 2011 19
  20. 20. Element• Logical component of a document, used to describe data, consists of: – A start tag – Content – An end tag• Example: <first>John</first>• The text between the start-tag and end-tag of an element is called the element content. PalGov © 2011 20
  21. 21. Rules for Elements/ Well-formed DocumentEvery start-tag must have a matching end-tag, or be a self-closing tag.Tags can‘t overlap; elements must be properly nested.XML documents can have only one root element.Element names must obey XML naming conventions.XML is case sensitive.XML will keep whitespace in your PCDATA PalGov © 2011 21
  22. 22. Naming Rules√ Names can start with letters or the dash (-) character, but not numbers or other punctuation characters.√ After the first character, numbers, hyphens, and periods are allowed.√ Names can‘t contain spaces.√ Names can‘t contain the colon (:) character.√ Names can‘t start with the letters xml, in uppercase, lowercase, or mixed√ There can‘t be a space after the opening < character; the name of the element must come immediately after it. PalGov © 2011 22
  23. 23. Whitespace in PCDATA• whitespace that includes things such as: • The space character • new lines (what you get when you press the Enter key), • Tabs• Whitespace is used to separate words, as well as to make text more readable.• In XML, no whitespace stripping takes place for PCDATA.• Example: <Tag>This is a paragraph. It has a whole bunch Of space.</Tag>• The PCDATA is: This is a paragraph. It has a whole bunch of space. PalGov © 2011 23
  24. 24. Whitespace in Markup• There could be whitespace within an XML document that‘s not actually part of the data. <Tag> <AnotherTag>This is some XML</AnotherTag> </Tag>• Any whitespace contained within <AnotherTag>‘s PCDATA is part of the data.• The newline after <Tag>, and some spaces before <AnotherTag>: These spaces could be there just to make the document easier to read, while not actually being part of its data.• This ―readability‖ whitespace is called extraneous whitespace. PalGov © 2011 24
  25. 25. Attributes• Simple name/value pairs associated with an element.• Attributes attached to the start-tag, but not to the end-tag.• Example: <name univ=‖PPU‖>• Attributes must have values—even if that value is just an empty string (such as ―‖).• Attributes values must be in quotes-single ‗ or double ―• Quotes must be matched.• You can include quote character in the attribute value.• Attributes must be unique in the same element.• Subjected to naming rules. PalGov © 2011 25
  26. 26. Attributes ….Cont• The order in which attributes are included on an element is not considered relevant.• If an XML parser encounters an element like:<name first=‖John‖ middle=‖Fitzgerald Johansen‖ last=‖Doe‖></name>• It doesn‘t necessarily have to give us the attributes in that order, but can do so in any order it wishes. PalGov © 2011 26
  27. 27. When to Use Attributes• Using attributes to separate different types of information.• Attributes use so much less space.• Elements can be more complex than attributes.• Attributes are unordered.Problems in Using Attributes• Attributes can‘t contain multiple values –elements can.• Attributes can‘t contain tree structure – elements can.• Attributes are not expandable- element ere.• Attributes can‘t force order- elements can. PalGov © 2011 27
  28. 28. Empty Elements• An empty complex element cannot have contents, only attributes.• Examples: <product prodid="1345" /> <product></product> <product/> <product prodid=―1345‖ />• Used when an element has no or optional PCDATA. PalGov © 2011 28
  29. 29. Trees• XML is hierarchical in nature.• Information is structured like a tree, with parent-child relationships.• This means that the order of information has to be arranged in a tree structure.• XML document forms a tree structure, starting at the root, and branches, then to the leaves. PalGov © 2011 29
  30. 30. Trees- Used Symbols Element appears multiple times Element appears one time only Element can be further broken PalGov © 2011 30
  31. 31. Tree- Example <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore> PalGov © 2011 31
  32. 32. Comments• XML comments ignored by the application that processes the xml document.• Useful for: – Documentation – Others viewing the document.Syntax< !- - Comment - - >Example:<!– this is an xml class --> PalGov © 2011 32
  33. 33. XML Declarations• A small collection of details that prepare XML processors for working with a document.Syntax:<?xml version=’1.0’ encoding=’UTF-16’ standalone=’yes’?>• The XML declaration starts with the characters <?xml and ends with the characters ?>.• If you include a declaration, you must include the version, but the encoding and standalone attributes are optional.• The version, encoding, and standalone attributes must be in that order.• The version should be 1.0 or 1.1• The XML declaration must be right at the beginning of the file. PalGov © 2011 33
  34. 34. Version• The version attribute specifies which version of the XML specification the document adheres to.• There are two versions of the XML specification, 1.0 and 1.1Example: <?xml version=‖1.0‖?> Or <?xml version=‖1.1‖?>• 1.1 is new, most processors supports 1.0 PalGov © 2011 34
  35. 35. Encoding• Text is stored in computers using numbers (1s,0s).• A character code is a one-to-one mapping between a set of characters and the corresponding numbers to represent those characters.• Character encoding is the method used to represent the numbers in a character code digitally (how many bytes should be used for each number).• ASCII: represents any character in numbers.• ISO-8859-1: created to add additional characters not covered by ASCII.• UTF-16 : uses two bytes for every character, (2 bytes = 16 bits = 65,356 possible values. PalGov © 2011 35
  36. 36. Encoding ….Cont  UTF-8: uses one byte for the characters covered ASCII. • any other characters may be represented by two or more bytes.• UTF-8 & UTF-16: √ UTF-8 will result in smaller file sizes (because each character requires only one byte). √ for text in other languages, UTF-16 can be smaller (because UTF-8 can require three or more bytes for some characters, whereas UTF- 16 would only require two). PalGov © 2011 36
  37. 37. Specifying a Character Encoding for XML … ContExamples:• <?xml version=’1.0’ encoding=’UTF-16’ ?>• <?xml version=’1.0’ encoding=’UTF-8’ ?>• <?xml version=’1.0’ encoding=’ASCII’ ?>• <?xml version=’1.0’ encoding= “ISO-8859-1” ?> PalGov © 2011 37
  38. 38. Standalone• Standalone = {yes or no}• Yes: specifies that the document exists entirely on its own, without depending on any other files.• No: indicates that the document may depend on an external DTD. PalGov © 2011 38
  39. 39. Why We Need Namespaces <?xml version=‖1.0‖?>Used to differentiate <person> <name> elements and <title>Sir</title> attributes of different <first>John</first> <middle>Fitzgerald Johansen</middle> XML document types <last>Doe</last> </name> from each other when <position>Vice President of Marketing</position> combining them in <résumé> <html> one document, or <head><title>Resume of John Doe</title></head> even when <body> processing multiple <h1>John Doe</h1> <p>John‘s a great guy, you know?</p> documents </body> To an XML parser, there isn’t any </html> simultaneously. difference between the two </résumé> <title> elements in this document. </person> PalGov © 2011 39
  40. 40. Using Prefixes <?xml version=‖1.0‖?>• The best way is for every <pers:person> element in a document to <pers:name> have a completely <pers:title>Sir</pers:title> <pers:first>John</pers:first> distinct name. <pers:middle>Fitzgerald Johansen</pers:middle>• This may occur as follow: <pers:last>Doe</pers:last> </pers:name> – Grouping elements <pers:position>Vice President of Marketing</pers:position> – Giving each group a <pers:résumé> <xhtml:html> unique prefix. <xhtml:head><xhtml:title>Resume of John Doe</xhtml:title> – Using the prefix in name </xhtml:head> elements. <xhtml:body> <xhtml:h1>John Doe</xhtml:h1> – Prefix:ElementName. <xhtml:p>John‘s a great guy, you know?</xhtml:p> </xhtml:body> </xhtml:html> </pers:résumé> </pers:person> PalGov © 2011 40
  41. 41. Why Doesn’t XML Just Use These Prefixes?• Prefixes have to be unique.• A problem will occur if two companies uses the same prefixes.• To solve this problem, you could take advantage of the already unambiguous Internet domain names in existence and specify that URIs must be used for the prefix names.• URI (Uniform Resource Identifier) is a string of characters that identifies a resource.• It can be in one of two flavors: – URL (Uniform Resource Locator) – URN (Universal Resource Name). PalGov © 2011 41
  42. 42. How XML Namespaces Work• The XML Namespaces Recommendation introduces a standard syntax for declaring namespaces and identifying the namespace for a given element or attribute in an XML document.• The XML namespaces specification is located at• To use XML namespaces in your documents, elements are given qualified names.• W3C specifications, qualified name is abbreviated to Qname.• These qualified names consist of two parts: – The local part, which is the same as the names we have been giving elements all along – The namespace prefix, which specifies to which namespace this name belongs. PalGov © 2011 42
  43. 43. How XML Namespaces Work…ContExample:• To declare a namespace called and associate a <person> element with that namespace, you would do something like the following:<pers:person xmlns:pers=‖‖/>• The key is the xmlns:pers attribute (xmlns stands for XML Namespace).• Here you are declaring the pers namespace prefix and the URI of the namespace that it represents ( PalGov © 2011 43
  44. 44. How XML Namespaces Work…Cont• The prefix can be used for any descendants of the <pers:person> element, to denote that they also belong to the namespace, as shown in the following example: <pers:person xmlns:pers=‖‖> <pers:name> <pers:title>Sir</pers:title> </pers:name> </pers:person>• Internally, when this document is parsed, the parser simply replaces any namespace prefixes with the namespace itself.• A parser might consider <pers:person> to be similar to <{>. PalGov © 2011 44
  45. 45. Default Namespaces• A default namespace is just like a regular namespace except that you don‘t have to specify a prefix for all of the elements that use it.• Example: <person xmlns=‖‖> <name> <title>Sir</title> </name> </person>• All descendent elements belongs the specified name space. PalGov © 2011 45
  46. 46. Default Namespaces…Cont• You can declare more than one namespace for an element, but only one can be the default.• This allows you to write XML like this: <person xmlns=‖‖ xmlns:xhtml=‖‖> <name/> <xhtml:p>This is XHTML</xhtml:p> </person> PalGov © 2011 46
  47. 47. Default Namespaces…Cont• You declared the namespaces and their prefixes, if applicable, in the root element so that all elements in the document can use these prefixes.• You can‘t write XML like this: <person xmlns=‖‖ xmlns=‖‖>• This tries to declare two default namespaces.• In this case, the XML parser wouldn‘t be able to figure out to what namespace the element belongs. PalGov © 2011 47
  48. 48. Declaring Namespaces on Descendants• Namespace prefixes can be declared in any element in the document.• Example: <person xmlns=‖‖> <name/> <xhtml:p xmlns:xhtml=‖‖> This is XHTML</xhtml:p> </person>• This makes the document more readable because namespaces declared closer to where they‘ll actually be used.• The prefix is available only in the element and its descendants. PalGov © 2011 48
  49. 49. Declaring Default Namespaces on Descendants• You can declare the namespace to be the default namespace for the element and its descendents.• Example: <person xmlns=‖‖> <name/> <p xmlns=‖‖>This is XHTML</p> </person> • is the default namespace for the document as a whole. • is the default namespace for the <p> element, and any of its descendants. • The namespace overrides the namespace, so that it doesn‘t apply to the <p> element. PalGov © 2011 49
  50. 50. Canceling Default Namespaces• Setting the value to an empty string to the namespace.• Example: <employee> <name>Jane Doe</name> <notes> <p xmlns=‖‖>I‘ve worked with <name xmlns=‖‖>Jane Doe</name> for over a <em>year</em> now.</p> </notes> </employee> PalGov © 2011 50
  51. 51. Do Different Notations Make Any Difference?<pers:person xmlns:pers=‖‖xmlns:xhtml=‖‖><pers:name/><xhtml:p>This is XHTML</xhtml:p></pers:person><person xmlns=‖‖xmlns:xhtml=‖‖><name/><xhtml:p>This is XHTML</xhtml:p></person> <person xmlns=‖‖> <name/> <p xmlns=‖‖>This is XHTML</p> </person> PalGov © 2011 51
  52. 52. Namespaces and Attributes• Do namespaces work the same for attributes as they do for elements?• The answer is no, they don‘t.• In fact, attributes usually don‘t have namespaces the way elements do.• They are just ―associated‖ with the elements to which they belong. PalGov © 2011 52
  53. 53. Understanding URIs• URI (Uniform Resource Identifier) is a string of characters that identifies a resource.• It can occur in one of two flavors: – URL (Uniform Resource Locator) – URN (Universal Resource Name).• A resource is anything that has identity. – An item that is retrievable over the Internet, such as an HTML document. – An item that is not retrievable over the Internet, such as the person who wrote that HTML document. PalGov © 2011 53
  54. 54. Summary• What XML is and why it‘s so useful? – A protocol for containing and managing information. – Store and retrieve data, format documents, put data in a presentable form, ensure data integrity, support multiple languages.• Namespaces used to differentiate elements and attributes of different XML document types from each other when combining them in one document, or even when processing multiple documents simultaneously. PalGov © 2011 54
  55. 55. Refrences• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt, A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing Inc: Indiana, USA.• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.• Amiano, M., DCruz, C., Ethier, K., Thomas, M., (2006), XML: Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.•••• PalGov © 2011 55
  56. 56. <e-Gov> Thank you </e-Gov> PalGov © 2011 56