More Related Content
Similar to Pal gov.tutorial2.session1.xml basics and namespaces
Similar to Pal gov.tutorial2.session1.xml basics and namespaces (15)
More from Mustafa Jarrar (20)
Pal gov.tutorial2.session1.xml basics and namespaces
- 1. أكاديمية الحكومة اإللكترونية الفلسطينية
The Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session1
XML Basics and Namespaces
Dr. Ismail M. Romi
Palestine Polytechnic University
PalGov © 2011 1
- 2. About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
Project Consortium:
Birzeit University, Palestine
University of Trento, Italy
(Coordinator )
Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium
Palestine Technical University, Palestine
Université de Savoie, France
Ministry of Telecom and IT, Palestine
University of Namur, Belgium
Ministry of Interior, Palestine
TrueTrust, UK
Ministry of Local Government, Palestine
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011
2
- 3. © Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 3
- 4. Tutorial Map
Topic h
Intended Learning Objectives
Session 1: XML Basics and Namespaces 3
A: Knowledge and Understanding
Session 2: XML DTD‘s 3
2a1: Describe tree and graph data models.
Session 3: XML Schemas 3
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
Session 4: Lab-XML Schemas 3
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath. Session 5: RDF and RDFs 3
2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3
2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3
heterogeneous data. Session 8: Lab-OWL 3
B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3
2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3
RDF). Session 11: Lab-Oracle Semantic Technology 3
2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5
2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5
2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1
C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1
2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1
and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3
D: General and Transferable Skills
2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5
2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3
PalGov © 2011 4
- 5. Session ILO’s:
After completing this session students will be able to:
•Describe tree and graph data models.
•Understand the notation of XML.
PalGov © 2011 5
- 6. Session1: XML Basics and Namespaces
Session Overview:
< Markup language />
< What is XML? />
< Components of XML Document/>
< Why we need namespaces />
< The syntax for using namespaces />
< What is a URI, a URL, and a URN />
PalGov © 2011 6
- 7. Markup
• Information added to the document that
enhances its meaning.
• It identifies the parts and how they relate to
each other.
PalGov © 2011 7
- 8. Markup language
A modern system for annotating a text in a
way that is syntactically distinguishable from
that text.
set of words and symbols for describing the identity of
pieces of a document (for example ‗this is a paragraph‘, ‗this is a
heading‘, ‗this is a list‘, ‗this is the caption of this figure‘, etc).
Programs can use this with a style sheet to create
output for screen, print, audio, video, Braille, etc.
Some markup languages (eg those used in word processors)
only describe appearances (‗this is italics‘, ‗this is
bold‘), but this method can only be used for display,
and is not normally re-usable for anything else.
PalGov © 2011 8
- 9. History of Markup
Efforts starts in 1960‘s
TROFF, TEX:
Presentation and formatting printed documents.
GenCod: (General Coding):
Uses descriptive generic tags to assemble
documents from multiple pieces.
GML: (IBM)-Generalized Markup Language:
Encoding documents for use with multiple
information subsystems.
Document can be edited, formatted, searched
by different programs.
PalGov © 2011 9
- 10. History of Markup…Cont
SGML Generalized Markup Language.
A framework for developing specialized markup
language.
Encode general purpose documents (books,
journals….)
Flexible, all-encompassing coding scheme.
Used for very large documentation projects.
Its usefulness limited to large organizations (high
requirements).
Companies develop their own SGML‘s, this means
that not compatible to browsers (ms-Explorer,
Netscape…)
PalGov © 2011 10
- 11. History of Markup…Cont -
HTML: Hypertext markup language
Developed I mid 1990‘s
Simple
Generic code principles
Specific tags (commands).
Tags are presentational and limited
Open standard (free not tied to any technology).
Limited in it‘s scope and can‘t be extended.
PalGov © 2011 11
- 12. History of Markup…Cont
XML: Extensible markup language
Combines the flexibility of SGML and the
simplicity of HTML
The W3C released the official XML version 1.0
specifications in 1998.
XML quickly gained popularity in the web
community.
XML itself is not a language, but rather a set of
rules that can be used to create markup
languages.
PalGov © 2011 12
- 13. What is XML?
• A protocol for containing and managing information.
XML is really all about creating your own markup.
Technically, XML is a meta-language, which means it's a
language that lets you create your own markup languages.
Unlike HTML, XML is meant for storing data, not displaying
it.
XML provides you with a way of containing, shaping,
structuring, and protecting data in documents.
XML is a general purpose information storage system.
XML documents are portable because they can be
interpreted by many different applications.
PalGov © 2011 13
- 14. Why “Extensible?’’
Because Anyone is free to mark up data in any way
using the language, even if others are doing it in
different ways.
We have full control over the creation of our XML
document.
Data can be shaped in any preferred way:
• You can create data in a way that only one particular computer
program will ever use, we can do so.
• You can share your data with other programs, or even other
companies across the Internet, XML gives flexibility to do that
as well.
You are free to structure the same data in different
ways that suit the requirements of an application
or category of applications.
PalGov © 2011 14
- 15. Functions of XML
1. Store and retrieve data
2. Formatting documents:
• Putting data in a presentable form.
3. Ensure data integrity:
• Guarantee a minimal level of trust in data (hasn‘t been
corrupted, truncated, mistyped, incomplete, broken….).
4. Support multiple languages:
• Support the character set (Unicode) which supports
hundreds of scripts (Latin, Arabic…).
PalGov © 2011 15
- 16. How I Get Started? Initial Requirements
1. Text Editor:
XML editor: Enables in composing and reading the
document, and prevent mistakes.
You can use (notepad) or any other editor that
support the character set used by the document.
2. XML Parser
A software program (XML processor) is required to process
an XML document (eg. Stylus).
3. Document Type Definition DTD, or Schema.
4. Viewing the Document :
View the document in technologies such as browsers or
XML environment (eg. Stylus).
PalGov © 2011 16
- 17. Where XML Can Be Used
• Reducing Server Load:
• keeping all information on the client for as long as possible, and
then sending the information to those servers in one big XML
document.
• Website Content:
• Transforming the same XML document to many formats.
• Combining many formats to one XML file…
• Distributed Computing:
• XML can be used as a means for sending data for distributed
computing, where objects on one computer call objects on another
computer to do work.
• e-Commerce:
• XML is the perfect format for the exchanging data between
computer processes and applications.
• Computer to computer data transfer.
PalGov © 2011 17
- 18. Components of XML Document
• XML Declaration
• Elements
• Attributes
• Entities
• Comments
PalGov © 2011 18
- 19. Tag
• Construct that begins with < and ends with >
• Start tag <name>
• End tag </name>
• Tags constitute the markup of the document.
PalGov © 2011 19
- 20. Element
• Logical component of a document, used to
describe data, consists of:
– A start tag
– Content
– An end tag
• Example:
<first>John</first>
• The text between the start-tag and end-tag of
an element is called the element content.
PalGov © 2011 20
- 21. Rules for Elements/ Well-formed Document
Every start-tag must have a matching end-tag, or be a
self-closing tag.
Tags can‘t overlap; elements must be properly nested.
XML documents can have only one root element.
Element names must obey XML naming conventions.
XML is case sensitive.
XML will keep whitespace in your PCDATA
PalGov © 2011 21
- 22. Naming Rules
√ Names can start with letters or the dash (-) character,
but not numbers or other punctuation characters.
√ After the first character, numbers, hyphens, and
periods are allowed.
√ Names can‘t contain spaces.
√ Names can‘t contain the colon (:) character.
√ Names can‘t start with the letters xml, in uppercase,
lowercase, or mixed
√ There can‘t be a space after the opening < character;
the name of the element must come immediately
after it.
PalGov © 2011 22
- 23. Whitespace in PCDATA
• whitespace that includes things such as:
• The space character
• new lines (what you get when you press the Enter key),
• Tabs
• Whitespace is used to separate words, as well as to
make text more readable.
• In XML, no whitespace stripping takes place for
PCDATA.
• Example:
<Tag>This is a paragraph. It has a whole bunch
Of space.</Tag>
• The PCDATA is:
This is a paragraph. It has a whole bunch
of space.
PalGov © 2011 23
- 24. Whitespace in Markup
• There could be whitespace within an XML
document that‘s not actually part of the data.
<Tag>
<AnotherTag>This is some XML</AnotherTag>
</Tag>
• Any whitespace contained within <AnotherTag>‘s PCDATA is
part of the data.
• The newline after <Tag>, and some spaces before
<AnotherTag>: These spaces could be there just to make the
document easier to read, while not actually being part of its
data.
• This ―readability‖ whitespace is called extraneous whitespace.
PalGov © 2011 24
- 25. Attributes
• Simple name/value pairs associated with an element.
• Attributes attached to the start-tag, but not to the end-tag.
• Example:
<name univ=‖PPU‖>
• Attributes must have values—even if that value is just an
empty string (such as ―‖).
• Attributes values must be in quotes-single ‗ or double ―
• Quotes must be matched.
• You can include quote character in the attribute value.
• Attributes must be unique in the same element.
• Subjected to naming rules.
PalGov © 2011 25
- 26. Attributes ….Cont
• The order in which attributes are included on
an element is not considered relevant.
• If an XML parser encounters an element like:
<name first=‖John‖ middle=‖Fitzgerald Johansen‖ last=‖Doe‖></name>
• It doesn‘t necessarily have to give us the
attributes in that order, but can do so in any
order it wishes.
PalGov © 2011 26
- 27. When to Use Attributes
• Using attributes to separate different types of
information.
• Attributes use so much less space.
• Elements can be more complex than attributes.
• Attributes are unordered.
Problems in Using Attributes
• Attributes can‘t contain multiple values –elements can.
• Attributes can‘t contain tree structure – elements can.
• Attributes are not expandable- element ere.
• Attributes can‘t force order- elements can.
PalGov © 2011 27
- 28. Empty Elements
• An empty complex element cannot have
contents, only attributes.
• Examples:
<product prodid="1345" />
<product></product>
<product/>
<product
prodid=―1345‖
/>
• Used when an element has no or optional PCDATA.
PalGov © 2011 28
- 29. Trees
• XML is hierarchical in nature.
• Information is structured like a tree, with
parent-child relationships.
• This means that the order of information has
to be arranged in a tree structure.
• XML document forms a tree structure,
starting at the root, and branches, then to
the leaves.
PalGov © 2011 29
- 30. Trees- Used Symbols
Element appears
multiple times
Element appears
one time only
Element can be
further broken
PalGov © 2011 30
- 31. Tree- Example
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
PalGov © 2011 31
- 32. Comments
• XML comments ignored by the application
that processes the xml document.
• Useful for:
– Documentation
– Others viewing the document.
Syntax
< !- - Comment - - >
Example:
<!– this is an xml class -->
PalGov © 2011 32
- 33. XML Declarations
• A small collection of details that prepare XML
processors for working with a document.
Syntax:
<?xml version=’1.0’ encoding=’UTF-16’ standalone=’yes’?>
• The XML declaration starts with the characters <?xml and ends
with the characters ?>.
• If you include a declaration, you must include the version, but
the encoding and standalone attributes are optional.
• The version, encoding, and standalone attributes must be in that
order.
• The version should be 1.0 or 1.1
• The XML declaration must be right at the beginning of the file.
PalGov © 2011 33
- 34. Version
• The version attribute specifies which version of the XML
specification the document adheres to.
• There are two versions of the XML specification, 1.0 and 1.1
Example:
<?xml version=‖1.0‖?>
Or
<?xml version=‖1.1‖?>
• 1.1 is new, most processors supports 1.0
PalGov © 2011 34
- 35. Encoding
• Text is stored in computers using numbers (1s,0s).
• A character code is a one-to-one mapping between a
set of characters and the corresponding numbers to
represent those characters.
• Character encoding is the method used to represent
the numbers in a character code digitally (how many
bytes should be used for each number).
• ASCII: represents any character in numbers.
• ISO-8859-1: created to add additional characters not
covered by ASCII.
• UTF-16 : uses two bytes for every character,
(2 bytes = 16 bits = 65,356 possible values.
PalGov © 2011 35
- 36. Encoding ….Cont
UTF-8: uses one byte for the characters covered ASCII.
• any other characters may be represented by two or
more bytes.
• UTF-8 & UTF-16:
√ UTF-8 will result in smaller file sizes (because each character
requires only one byte).
√ for text in other languages, UTF-16 can be smaller (because UTF-8
can require three or more bytes for some characters, whereas UTF-
16 would only require two).
PalGov © 2011 36
- 37. Specifying a Character Encoding for XML … Cont
Examples:
• <?xml version=’1.0’ encoding=’UTF-16’ ?>
• <?xml version=’1.0’ encoding=’UTF-8’ ?>
• <?xml version=’1.0’ encoding=’ASCII’ ?>
• <?xml version=’1.0’ encoding= “ISO-8859-1” ?>
PalGov © 2011 37
- 38. Standalone
• Standalone = {yes or no}
• Yes: specifies that the document exists
entirely on its own, without depending on any
other files.
• No: indicates that the document may depend
on an external DTD.
PalGov © 2011 38
- 39. Why We Need Namespaces
<?xml version=‖1.0‖?>
Used to differentiate <person>
<name>
elements and <title>Sir</title>
attributes of different <first>John</first>
<middle>Fitzgerald Johansen</middle>
XML document types <last>Doe</last>
</name>
from each other when <position>Vice President of Marketing</position>
combining them in <résumé>
<html>
one document, or <head><title>Resume of John
Doe</title></head>
even when <body>
processing multiple <h1>John Doe</h1>
<p>John‘s a great guy, you know?</p>
documents </body>
To an XML parser, there isn’t any
</html>
simultaneously. difference between the two
</résumé>
<title> elements in this document.
</person>
PalGov © 2011 39
- 40. Using Prefixes
<?xml version=‖1.0‖?>
• The best way is for every <pers:person>
element in a document to <pers:name>
have a completely <pers:title>Sir</pers:title>
<pers:first>John</pers:first>
distinct name. <pers:middle>Fitzgerald Johansen</pers:middle>
• This may occur as follow: <pers:last>Doe</pers:last>
</pers:name>
– Grouping elements <pers:position>Vice President of Marketing</pers:position>
– Giving each group a <pers:résumé>
<xhtml:html>
unique prefix. <xhtml:head><xhtml:title>Resume of John Doe</xhtml:title>
– Using the prefix in name </xhtml:head>
elements. <xhtml:body>
<xhtml:h1>John Doe</xhtml:h1>
– Prefix:ElementName. <xhtml:p>John‘s a great guy, you know?</xhtml:p>
</xhtml:body>
</xhtml:html>
</pers:résumé>
</pers:person>
PalGov © 2011 40
- 41. Why Doesn’t XML Just Use These
Prefixes?
• Prefixes have to be unique.
• A problem will occur if two companies uses the same prefixes.
• To solve this problem, you could take advantage of the already
unambiguous Internet domain names in existence and specify that
URIs must be used for the prefix names.
• URI (Uniform Resource Identifier) is a string of characters that
identifies a resource.
• It can be in one of two flavors:
– URL (Uniform Resource Locator)
– URN (Universal Resource Name).
PalGov © 2011 41
- 42. How XML Namespaces Work
• The XML Namespaces Recommendation introduces a standard syntax
for declaring namespaces and identifying the namespace for a given
element or attribute in an XML document.
• The XML namespaces specification is located at
http://www.w3.org/TR/REC-xml-names/
• To use XML namespaces in your documents, elements are given
qualified names.
• W3C specifications, qualified name is abbreviated to Qname.
• These qualified names consist of two parts:
– The local part, which is the same as the names we have been giving
elements all along
– The namespace prefix, which specifies to which namespace this name
belongs.
PalGov © 2011 42
- 43. How XML Namespaces Work…Cont
Example:
• To declare a namespace called
http://www.wiley.com/pers and associate a
<person> element with that namespace, you
would do something like the following:
<pers:person xmlns:pers=‖http://www.wiley.com/pers‖/>
• The key is the xmlns:pers attribute (xmlns stands for XML
Namespace).
• Here you are declaring the pers namespace prefix and the URI of the
namespace that it represents (http://www.wiley.com/pers
PalGov © 2011 43
- 44. How XML Namespaces Work…Cont
• The prefix can be used for any descendants of the <pers:person>
element, to denote that they also belong to the
http://www.wiley.com/pers namespace, as shown in the following
example:
<pers:person xmlns:pers=‖http://www.wiley.com/pers‖>
<pers:name>
<pers:title>Sir</pers:title>
</pers:name>
</pers:person>
• Internally, when this document is parsed, the parser simply replaces
any namespace prefixes with the namespace itself.
• A parser might consider <pers:person> to be similar to
<{http://www.wiley.com/pers/person>.
PalGov © 2011 44
- 45. Default Namespaces
• A default namespace is just like a regular namespace
except that you don‘t have to specify a prefix for all of the
elements that use it.
• Example:
<person xmlns=‖http://www.wiley.com/pers‖>
<name>
<title>Sir</title>
</name>
</person>
• All descendent elements belongs the specified name
space.
PalGov © 2011 45
- 46. Default Namespaces…Cont
• You can declare more than one namespace for an
element, but only one can be the default.
• This allows you to write XML like this:
<person xmlns=‖http://www.wiley.com/pers‖
xmlns:xhtml=‖http://www.w3.org/1999/xhtml‖>
<name/>
<xhtml:p>This is XHTML</xhtml:p>
</person>
PalGov © 2011 46
- 47. Default Namespaces…Cont
• You declared the namespaces and their prefixes, if
applicable, in the root element so that all elements in the
document can use these prefixes.
• You can‘t write XML like this:
<person xmlns=‖http://www.wiley.com/pers‖
xmlns=‖http://www.w3.org/1999/xhtml‖>
• This tries to declare two default namespaces.
• In this case, the XML parser wouldn‘t be able to figure out
to what namespace the element belongs.
PalGov © 2011 47
- 48. Declaring Namespaces on Descendants
• Namespace prefixes can be declared in any element in the document.
• Example:
<person xmlns=‖http://www.wiley.com/pers‖>
<name/>
<xhtml:p xmlns:xhtml=‖http://www.w3.org/1999/xhtml‖>
This is XHTML</xhtml:p>
</person>
• This makes the document more readable because namespaces declared
closer to where they‘ll actually be used.
• The prefix is available only in the element and its descendants.
PalGov © 2011 48
- 49. Declaring Default Namespaces on
Descendants
• You can declare the namespace to be the default namespace for the
element and its descendents.
• Example:
<person xmlns=‖http://www.wiley.com/pers‖>
<name/>
<p xmlns=‖http://www.w3.org/1999/xhtml‖>This is XHTML</p>
</person>
• http://www.wiley.com/pers is the default namespace for the
document as a whole.
• http://www.w3.org/1999/xhtml is the default namespace for the
<p> element, and any of its descendants.
• The http://www.w3.org/1999/xhtml namespace overrides the
http://www.wiley.com/pers namespace, so that it doesn‘t apply to the <p>
element.
PalGov © 2011 49
- 50. Canceling Default Namespaces
• Setting the value to an empty string to the namespace.
• Example:
<employee>
<name>Jane Doe</name>
<notes>
<p xmlns=‖http://www.w3.org/1999/xhtml‖>I‘ve worked
with <name xmlns=‖‖>Jane Doe</name> for over a
<em>year</em>
now.</p>
</notes>
</employee>
PalGov © 2011 50
- 51. Do Different Notations Make Any
Difference?
<pers:person xmlns:pers=‖http://www.wiley.com/pers‖
xmlns:xhtml=‖http://www.w3.org/1999/xhtml‖>
<pers:name/>
<xhtml:p>This is XHTML</xhtml:p>
</pers:person>
<person xmlns=‖http://www.wiley.com/pers‖
xmlns:xhtml=‖http://www.w3.org/1999/xhtml‖>
<name/>
<xhtml:p>This is XHTML</xhtml:p>
</person>
<person xmlns=‖http://www.wiley.com/pers‖>
<name/>
<p xmlns=‖http://www.w3.org/1999/xhtml‖>This is XHTML</p>
</person>
PalGov © 2011 51
- 52. Namespaces and Attributes
• Do namespaces work the same for attributes as
they do for elements?
• The answer is no, they don‘t.
• In fact, attributes usually don‘t have namespaces
the way elements do.
• They are just ―associated‖ with the elements to
which they belong.
PalGov © 2011 52
- 53. Understanding URIs
• URI (Uniform Resource Identifier) is a string of characters
that identifies a resource.
• It can occur in one of two flavors:
– URL (Uniform Resource Locator)
– URN (Universal Resource Name).
• A resource is anything that has identity.
– An item that is retrievable over the Internet, such as an HTML
document.
– An item that is not retrievable over the Internet, such as the person
who wrote that HTML document.
PalGov © 2011 53
- 54. Summary
• What XML is and why it‘s so useful?
– A protocol for containing and managing information.
– Store and retrieve data, format documents, put data
in a presentable form, ensure data integrity, support
multiple languages.
• Namespaces used to differentiate elements and
attributes of different XML document types from
each other when combining them in one
document, or even when processing multiple
documents simultaneously.
PalGov © 2011 54
- 55. Refrences
• Hunter, H, Rafter, J., Fawcett, J., Vlist, E., Ayers, D., Duckett, J., Watt,
A., McKinnon,L., (2007), "Beginning XML", 4th Ed.,Wiley Publishing
Inc: Indiana, USA.
• Ray, E., (2003), "Learning XML", 2nd Ed., O‘Rreilly Media Inc.: USA.
• Amiano, M., D'Cruz, C., Ethier, K., Thomas, M., (2006), XML:
Problem - Design – Solution", Wiley Publishing Inc: Indiana, USA.
• http://www.w3.org
• http://www.w3schools.com
• http://www.xml.com
• http://www.xml.org
PalGov © 2011 55