The document is a lecture on XML and its sub-languages. It begins with an introduction to XML, describing it as an extensible markup language that allows the creation of new tags to structure documents in various domains. It then discusses XML specifications like elements, attributes, and namespaces. Later sections cover document type definitions (DTDs) for validating XML documents and ensuring they follow specified syntax rules. The document provides examples and explanations of various XML concepts.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Semantic Web - XML and sublanguages
1. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 1
Semantic Web
Unit 3: XML and its Sub-Languages
Faculty of Science, Technology and Communication (FSTC)
Bachelor en informatique (professionnel)
2. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 2
3. XML and its Sub-Languages
Semantic Web Roadmap:
Controlled growth bottom
up according to this
architecture.
Architecture was (slightly)
modified in the last years.
2
3. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 3
3.1. Why is HTML not Sufficient?
3.2. XML - Introduction
3.3. XML – Language Specifications
3.4. Document Type Definitions (DTD)
3.5. XML Schemas
3.6. Namespaces
3.7. Programming Models
3.8. XLink, XPath and XPointer
3.9. XSL Transformations (XSLT)
3.10. References
3. XML and its Sub-Languages
3
4. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 4
3.1. Why is HTML not Sufficient?
3. XML and its Sub-Languages
<h1>Christoph Meinel</h1>
<h2>Viola Brehmer</h2>
<ul>
<li>Long Wang</li>
<li>Feng Cheng</li>
<li>Dirk Cordel</li>
<li>Serge Linckels</li>
</ul>
Harald Sack
Limitations of HTML
HTML was initiated to give a structure to a
document and to modify its layout; NOT to
describe semantics
What is this Web page about?
What position has "Viola Brehmer"?
…
Meta-Tags
<meta name="description"
content="Homepage of Serge Linckels">
<meta name="keywords"
content="teacher, athlete">
<meta name="Author"
content="The Master of the Universe">
<meta name="xyz"
content="nothing special">
Do you believe in Meta-Tags?
HTML metadata are created by the author of the Web page.
Their syntax and semantics are individual.
4
5. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 5
3.2. XML - Introduction
3. XML and its Sub-Languages
Extensible Markup Language (XML)
Markup Language: allows to give a structure to text documents
by using tags
Meta Language: XML does not have a fixed set of tags (new
tags can be created)
Extensible: XML can be adapted (extended) to meet many
different domains, e.g.,
• Mathematical Markup Language (MathML)
• Chemical Markup Language (CML)
• Synchronized Multimedia Integration Language (SMIL)
• WAP Markup Language (WML)
Creator Jon Bosak, 1996
XML is not…
a programming language
a network transport protocol
a database
XML is…
a simple data format
platform independent
does not require special applications
5
6. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 6
3.2. XML - Introduction
3. XML and its Sub-Languages
Picture created by Harald Sack
6
7. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 7
3.2. XML - Introduction
3. XML and its Sub-Languages
Picture created by Harald Sack
7
8. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 8
3.2. XML - Introduction
3. XML and its Sub-Languages
Standard Generalized Markup Language (SGML)
The Standard Generalized Markup Language
(SGML) is a metalanguage in which one can
define markup languages for documents
HTML XML
XHTML
• Instance of SGML
• Layout of data
• Layout and data
are mixed-up
• Subset of SGML
• Structure of data
• Layout and data
are separated
8
9. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 9
Welcome to LIASIT
LIASIT stands for Luxembourg Advanced Studies in Information
Technology and since August 01, 2006 is a Doctoral School in the Faculty
of Science, Technology and Communication.
The faculty is composed of the following professors: David BASIN, Pascal
BOUVRY, Eric DUBOIS, Thomas ENGEL, Franck LEPREVOST, Christoph
MEINEL, Nicolas GUELFI, and Björn OTTERSTEN.
The PhD Students are: Christoph BRANDT, Pandu DEVARAKOTA, Daniel
FISCHER, Benjamin GATEAU, Markus GROSS, Joel GROTZ, Annie
GUERRIERO, Serge LINCKELS, Nicolas MAYER, Michael NOLL, Benoît RIES,
Michael STIEGHAHN.
Magali MARTIN is the secretary of LIASIT... and also a nice entertainer.
3.2. XML - Introduction
3. XML and its Sub-Languages
How we can see this…
9
10. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 10
Welcome to LIASIT
LIASIT stands for Luxembourg Advanced Studies in
Information Technology and since August 01, 2006 is a
Doctoral School in the Faculty of Science, Technology
and Communication.
The faculty is composed of the following professors:
David BASIN, Pascal BOUVRY, Eric DUBOIS, Thomas ENGEL,
Franck LEPREVOST, Christoph MEINEL, Nicolas GUELFI,
and Brn OTTERSTEN.
The PhD Students are: Christoph BRANDT, Pandu
DEVARAKOTA, Daniel FISCHER, Benamin GATEAU, Markus
GROSS, Joel GROTZ, Annie GUERRIERO, Serge LINCKELS,
Nicolas MAYER, Michael NOLL, Benot RIES, Michael
STIEGHAHN.
Magali MARTIN is the secretary of LIASIT... and also
a nice entertainer.
3.2. XML - Introduction
3. XML and its Sub-Languages
What a computer sees…
10
11. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 11
<title>Welcome to LIASIT</title>
<description>LIASIT stands for Luxembourg Advanced Studies
in Information Technology and since August 01, 2006 is
a Doctoral School in the Faculty of Science,
Technology and Communication.</description>
<profs>
<name>The faculty</name>
<name>is composed of</name>
<name>the following</name>
</profs>
<students>
<name>The PhD</name>
<name>Students are:</name>
<name>Christoph BRANDT,</name>
<name>Pandu DEVARAKOTA,</name>
</students>
<administration>
<name>Daniel FISCHER, <name>
</administration>
3.2. XML - Introduction
3. XML and its Sub-Languages
How we can help…
11
12. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 12
<title>Welcome to LIASIT</title>
<description>LIASIT stands for Luxembourg Advanced
Studies in Information Technology and since August 01,
2006 is a Doctoral School in the Faculty of Science,
Technology and Communication.</description>
<profs>
<name>The faculty</name>
<name>is composed of</name>
<name>the following</name>
</profs>
<students>
<name>The PhD</name>
<name>Students are:</name>
<name>Christoph BRANDT,</name>
<name>Pandu DEVARAKOTA,</name>
</students>
<administration>
<name>Daniel FISCHER, </name>
</administration>
How we can help…
3. XML and its Sub-Languages
3.2. XML - Introduction
12
13. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 13
<profs>
<name>Thomas Engel</name>
<name>Christoph Meinel</name>
<name>David Basin</name>
<name> Björn Ottersten</name>
</profs>
<students>
<name>Benoît Ries</name>
<name>Daniel Fischer</name>
<name>Christoph Brandt,</name>
<name>Pandu Devarakota</name>
<name>Serge Linckels</name>
</students>
3.3. XML - Introduction
3. XML and its Sub-Languages
Benefits of XML
Document is well-structured
Applications can process the file
XML file
Thomas Engel
Christoph Meinel
David Basin
Björn Ottersten
Benoît Ries
Daniel Fischer
Christoph Brandt
Pandu Devarakota
Serge Linckels
pure text file
Problems with text file
No structure
Difficult to process
Attention: although XML adds a certain amount of
semantics to the document, there are sill
information that are missing, e.g., what is the
relation between "profs" and "students"?
13
14. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 14
3.3. XML – Language Specifications
3. XML and its Sub-Languages
<person type="Teacher">
<name>Serge Linckels</name>
<hp>http://www.linckels.lu</hp>
<size>173</size>
<phone>691-111111</phone>
</person>
element
attribute
child-element
value
Terminology
Tree representation
person
name hp size phone
Serge Linckels
http://www.linckels.lu
173
691-111111
General
XML is composed of text and tags
Tags come in pairs, e.g., <hp></hp>
Tags must be properly nested, e.g.,
<person><hp></person></hp>
<person><hp></hp></person>
Tags are case sensitive: <hp> ≠ <HP>
14
15. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 15
<staff>
</staff>
3.3. XML – Language Specifications
3. XML and its Sub-Languages
XML structure
<person type="Teacher">
<name>Serge Linckels</name>
<hp>http://www.linckels.lu</hp>
<size>173</size>
<phone>26 00 11 22</phone>
<phone>691-111111</phone>
</person>
<person type="Teacher">
<name>Denis Zampunieris</name>
<phone>4666445290</phone>
</person>
same element can be used repeatedly
Nested tags can be part
of a list too.
Order is not significant.
Element or attribute?
<name>
<first>Serge</first>
<last>Linckels</last>
</name>
<name first="Serge" last="Linckels”></name>
Both variants are semantically identical
or
15
16. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 16
3.3. XML – Language Specifications
3. XML and its Sub-Languages
XML Names
Can include:
- letters (a..z, A..Z)
- digits (0..9)
- these punctuation chars:
- underscore (_)
- hyphen (-)
- period (.)
- special chars like ö, ç, Ω
Examples for valid XML Names:
<drivers_licence>
<_oki-doki>
<téléphone>
<this.works>
CDATA Sections
Everything between <![CDATA[ and ]]> is
treated as raw character data.
<person type="Teacher">
<name>Serge Linckels</name>
<![CDATA[This is just some
code that is ignored,
10 print "Hello world"
20 goto 10
]]>
<phone>26 00 11 22</phone>
</person>
Comments
Comments are between <!-- and --> like in
HTML
16
17. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 17
3.3. XML – Language Specifications
3. XML and its Sub-Languages
XML declaration
<?xml version="1.0" encoding="ASCII" standalone="yes"?>
<person type="Teacher">
<name>Serge Linckels</name>
<hp>http://www.linckels.lu</hp>
<size>173</size>
<phone>691-111111</phone>
</person>
encoding: XML is pure text, but can use different encoding, e.g.,
ASCII, Latin-1, Unicode, ISO-8859-1. When omitted then Unicode is
default.
standalone:
- "yes", no external DTD/Schema is given
- "no", external DTD/Schema is specified
XML-defined character sets
Unicode: 95156 characters from most of Earths
living languages (variants: UCS-2, UCS-4, UTF-8,
UTF-16)
ISO character sets: e.g., ISO-8859-15 (Latin 9)
is ASCII + accented letters + €
17
18. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 18
3.3. XML – Language Specifications
3. XML and its Sub-Languages
JSON – Javascript Object Notation
No element names
Primary data format used for asynchronous
browser/server communication (AJAX)
Language-independent data format
Supported by many programming languages
18
19. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 19
3.4. Document Type Definitions (DTD)
3. XML and its Sub-Languages
<person type="Teacher">
<name>
<first>Serge</first>
<last>Linckels</last>
</name>
<phone>691-111111</phone>
</person>
<Personne>
<Type>Teacher</Type>
<Nom>Serge Linckels</Nom>
<HP>http://www.linckels.lu</HP>
<Sexe>M</Sexe>
</Personne>
?
Formal syntax is required
DTD – Document Type Definitions
Syntax of XML document is described
Validating parser checks syntax:
XML document with DTD syntax
A XML document is valid if it respects the
syntax defined in its DTD
<!ELEMENT person (name, phone*)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
#PCDATA: value of type string
A person-element can contain
1 name sub-element and 0..*
phone sub-elements
Attention: a document can be well-formed
but not valid!
Web browser only checks if well-formed.
19
20. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 20
3.4. Document Type Definitions (DTD)
3. XML and its Sub-Languages
<?xml version="1.0" standalone="no"?>
<!DOCTYPE person SYSTEM "http://www.linckels.lu/person.dtd">
<person type="Teacher">
<name>
<first>Serge</first>
<last>Linckels</last>
</name>
<phone>691-111111</phone>
</person>
<!ELEMENT person (name, phone*)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
person.xml
person.dtd
URI of the DTD-file
e.g., "/mydisk/person.dtd"
Validating a document
A Web browser does not validate documents
but only checks it for well-formedness
XML validators APIs are available in Java
Online validators, e.g.,
http://www.stg.brown.edu/service/xmlvalid/
http://www.w3.org/2001/03/webdata/xsv
20
21. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 21
3.4. Document Type Definitions (DTD)
3. XML and its Sub-Languages
<?xml version="1.0"?>
<!DOCTYPE person [
<!ELEMENT person (name, phone*)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<person type="Teacher">
<name>
<first>Serge</first>
<last>Linckels</last>
</name>
<phone>691-111111</phone>
</person>
Valid XML document with internal DTD
Sequences
* Zero or more of the element is allowed
? Zero or one of the element is allowed
+ One or more of the element is required
Elements must appear in the specified order
Choices
<!ELEMENT color (red | green)
Here, the element color can have a child-
element red or green, not both at a time.
21
22. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 22
3.4. Document Type Definitions (DTD)
3. XML and its Sub-Languages
<?xml version="1.0"?>
<!DOCTYPE person [
<!ELEMENT person (name, phone*)>
<!ATTLIST name first CDATA #IMPLIED
last CDATA #REQUIRED
>
<!ELEMENT phone (#PCDATA)>
]>
<person type="Teacher">
<name first="Serge" last="Linckels" />
<phone>691-111111</phone>
</person>
Attribute declarations
Attribute defaults:
#IMPLIED: value is optional
#REQUIRED: value is required
#FIXED: value is constant
Literal: value is given as quoted string
Attribute types:
CDATA: any string of text
Enumeration: list of values
ID: unique XML name
IDREF: unique identification of some
element in the document
IDREFS: set of IDREFs
Valid XML document with internal DTD
22
23. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 23
<family>
<person id="jane" mother="mary" father="john">
<name>Jane Doe</name>
</person>
<person id="john" children="jane jack">
<name>John Doe</name>
</person>
<person id="mary" children="jane jack">
<name>Mary Smith</name>
</person>
<person id="jack" mother="mary" father="john">
<name>Jack Smith</name>
</person>
</family>
3.4. Document Type Definitions (DTD)
3. XML and its Sub-Languages
DTD – ID, IDREF and IDREFS
<!DOCTYPE family [
<!ELEMENT family (person*)>
<!ELEMENT person (name)>
<!ELEMENT name (#PCDATA)>
<!ATTLIST person
id ID #REQUIRED
mother IDREF #IMPLIED
father IDREF #IMPLIED
children IDREFS #IMPLIED>
]>
XML document
DTD
23
24. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 24
3.4. Document Type Definitions (DTD)
3. XML and its Sub-Languages
Problems and limitations
DTD are context-free grammars; recursive definitions are possible
Order matters, e.g.,
<!ELEMENT person (last, first)>
Workaround:
<!ELEMENT person ((last, first) | (first, last))>
Can become unclear:
<!ELEMENT person ((name | phone | e-mail)*)>
Lacks of expressiveness, e.g., restriction over references are not possible
All elements are global in one namespace
XML Schema, more powerful than DTD and W3C recommendation
No support for newer features of XML
DTD are expressed in a non-XML syntax
…but there are numerous other XML schema languages, e.g., RELAX NG, ISO DSDL, Schematron…
24
25. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 25
3.5. XML Schemas
3. XML and its Sub-Languages
XML Schema - Overview
XML Schema is an XML document containing a formal description of what comprises a valid
XML document
An XML document described by a schema is called an instance document
More explicit restrictions on the number and sequence of child elements are possible
Example
<?xml version="1.0"?>
<fullName>Serge Linckels</fullname>
XML document
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="fullName" type="xs:string" />
</xs:schema>
XML Schema
xs: is standard
prefix for XML
Schema namespace
25
26. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 26
Atomic types (more than 40!)
string: Unicode string
3.5. XML Schemas
3. XML and its Sub-Languages
integer: positive or negative number
boolean: true/false or 0/1
ID, IDREF, IDREFS: cf. DTD
Simple types
New simple types can be created by using atomic types
<xs:element name="first" type="xs:string" />
<xs:element name="age" type="xs:integer" />
<xs:element name="link" type="xs:anyURI" />
<xs:element name="year" type="xs:year" />
<xs:simpleType name="aName" base="xs:string" />
Restrictions can be defined
<xs:simpleType name="aName">
<xs:resriction base="xs:string>
<xs:maxLength value="50" />
</xs:restriction>
</xs:simpleType>
26
27. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 27
More about restrictions
Restrictions (facets) can be defined over simple types using xs:restriction
<xs:simpleType name="location">
<xs:resriction base="xs:string>
<xs:enumeration value="work" />
<xs:enumeration value="school" />
<xs:enumeration value="mobile" />
</xs:restriction>
</xs:simpleType>
3.5. XML Schemas
3. XML and its Sub-Languages
Enumerations:
<xs:simpleType name="age">
<xs:resriction base="xs:unsignedShort>
<xs:minExclusive value="0" />
<xs:maxInclusive value="120" />
</xs:restriction>
</xs:simpleType>
Numeric facets:
possible values:
• minInclusive
• maxInclusive
• minExclusive
• maxExclusive
27
28. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 28
More about restrictions
<xs:simpleType name="mobile-phone">
<xs:resriction base="xs:string>
<xs:pattern value="ddd-dd dd dd" />
</xs:restriction>
</xs:simpleType>
Enforcing format:
3.5. XML Schemas
3. XML and its Sub-Languages
Enforces the rule that a
mobile phone-number
consists of 3 digits, a
dash, 2 digits, a space, 2
digits, another space and
finally 2 digits.
<xs:simpleType name="TypeAuthor">
<xs:list itemType="xs:string />
</xs:simpleType>
Lists:
<xs:element name="author" type="TypeAuthor" />
XML Schema – element definition
XML Schema – simple type definition
The author element of an
instance document can
contain an unlimited list
of strings, each separated
by a whitespace
28
29. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 29
3.5. XML Schemas
3. XML and its Sub-Languages
Complex types
A complex type is an element that contains
child-elements
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="first" type="xs:string" />
<xs:element name="last" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
<?xml version="1.0"?>
<person>
<first>Serge</first>
<last>Linckels</last>
</person>
XML document
XML Schema
Only elements can have complex types,
attributes always have simple types
sequence: order of elements matters (a,b)
all: order of elements does not matter (a,b or b,a)
choice: one or the other element (a xor b)
29
30. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 30
3.5. XML Schemas
3. XML and its Sub-Languages
Occurrence Constraints
Set the number of times an element may occur:
minOccurs: minimum occurrences
maxOccurs: maximum occurrences
<xs:element name="middle" type="xs:string"
minOccurs="0" maxOccurs="unbounded“ />
The default value for minOccurs and maxOccurs is 1
In this example, maxOccurs is not set, but has a default value of 1. Therefore, the middle
element may appear 0 or 1 times.
The value unbounded indicates that the element may appear an unlimited number of times.
30
31. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 31
3.5. XML Schemas
3. XML and its Sub-Languages
Derived complex types
Deriving by extension: add new definitions to existing complex type. E.g.,
add the phone element to the existing person type.
<xs:complexType name="PersonWithPhone">
<xs:extension base="person">
<xs:sequence>
<xs:element name="phone" type="xs:string" />
</xs:sequence>
</xs:extension>
</xs:complexType>
Deriving by restriction: by omitting parts of the parent definition, the
restriction element create a new, constrained type.
<xs:complexType name="PersonWithMoreNames">
<xs:restriction base="person">
<xs:sequence>
<xs:element name="first" type="xs:string" minOccurs="2" />
<xs:element name="last" type="xs:string" />
</xs:sequence>
</xs:restriction>
</xs:complexType>
Structure cannot
be changed!
Only available for
complex types
31
32. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 32
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="first" type="xs:string" />
<xs:element name="last" type="xs:string" />
</xs:sequence>
<xs:attribute name="job" type="xs:string" />
</xs:complexType>
</xs:element>
3.5. XML Schemas
3. XML and its Sub-Languages
Attribute declarations
Attributes can be declared globally by top-level xs:attribute
<xs:attribute name="job" type="xs:string" use="optional" />
Attributes can be declared locally as part of a complex type definition
<?xml version="1.0"?>
<person job="Teacher">
<first>Serge</first>
<last>Linckels</last>
</person>
possible values:
optional or required
XML document
32
33. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 33
3.5. XML Schemas
3. XML and its Sub-Languages
Conclusion: DTD vs. XML Schemas
XML Schemas is a more powerful language than DTD to specify the syntax of XML
documents; therefore, it is more expressive in terms of semantics
XML Schemas is a W3C recommendation and widely used. As DTD are simpler to use, they
are still used today
33
34. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 34
3.6. Namespaces
3. XML and its Sub-Languages
<compactDisk author="HS">
<titel>Remixes</titel>
<track number="1">
<titel>Night over Manaus</titel>
<author>Boozoo Bajou</author>
</track>
</compactDisk>
Problem of ambiguous names
XML names can be used for different
elements. But this creates ambiguities.
XML namespaces disambiguate elements with the same name from each
other by assigning elements and attributes to URIs.
Qualified names, prefixes and local parts
Elements are identified by qualified names:
cd:titel
prefix local name
qualified name
34
35. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 35
3.6. Namespaces
3. XML and its Sub-Languages
Using XML namespaces
<cd:compactDisk
xmlns:cd = "http://www.linckels.lu/cd"
xmlns:tr = "http://www.xyz.com/tracks"
author="HS">
<cd:titel>Remixes</cd:titel>
<tr:track number="1">
<tr:titel>Night over Manaus</tr:titel>
<tr:author>Boozoo Bajou</tr:author>
</tr:track>
</cd:compactDisk>
Each element exists in a unique
namespace
Namespace URIs are purely formal
identifiers; they are not the
addresses of a page, and they are
not meant to be followed as links
<tr:title>Remixes</tr:title>
Instead of using a prefix, the complete URI can be indicated, e.g.,
<http://www.xyz.com/tracks#title>
Remixes
</http://www.xyz.com/tracks#title>
Namespace binding: each prefix in a qualified name must be associated with a URI
Namespaces only apply to elements,
not to attributes
35
36. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 36
3.7. Programming Models
3. XML and its Sub-Languages
Common XML processing models
Treating XML as text
Treating XML as events; the document is read as it happens (e.g., an "event" can
be the start of an element, the content of an element, and the end of an element)
Treating XML as tree models
XML transformations
Abstracting XML always; do not consider the XML elements
Most commonly used
Document Object Model (DOM)
Simple API for XML (SAX)
36
37. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 37
3.7. Programming Models
3. XML and its Sub-Languages
DOM overview
The entire document must be read and parsed before it is available as DOM; unsuitable for
very large documents
User accesses data by traversing the tree (tree and its traversal conform to a W3C standard)
The API allows for constructing, accessing and manipulating the structure and content of XML
documents
<countries>
<country continent="Asia">
<name>Israel</name>
<population year="2001">6199008</population>
<city capital="yes"><name>Jerusalem</name></city>
<city><name>Ashdod</name></city>
</country>
<country continent="Europe">
<name>France</name>
<population year="2004">60424213</population>
</country>
</countries>
Example:
37
38. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 38
Asia
2001
60424213
3.7. Programming Models
3. XML and its Sub-Languages
population
document
countries
country
continent name
Israel
population
year
6199008
city
capital
yes Jerusalem
name
city
capital
no Ashod
continent
Europe France 2004
namename
country
year
DOM tree
root node
node
value node
38
39. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 39
API
Application
XML
document DOM parser
DOM tree
(in memory)
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API - general
DOM tree is generated by a DocumentBuilder
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document myXMLdoc = builder.parse("world.xml");
The builder is generated by a Factory to be implementation independent.
The factory is chosen according the system configuration
39
40. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 40
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API - general
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
40
41. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 41
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API – node API
//create the root element
Element root = myXMLdoc.createElement("root");
//add it to the xml tree
myXMLdoc.appendChild(root);
//create child element
Element childElement = myXMLdoc.createElement("Child");
//add the attribute to the child
childElement.setAttribute("attribute1","The value of Attribute 1");
//add child element to the root element
root.appendChild(childElement);
The nodes of the DOM tree include
- a special root (denoted document)
- element nodes
- text nodes and CDATA sections
- attributes
- comments
- and more ...
Examples:
41
myXMLdoc
root
attribute1
The value of
Attribute 1
Child
42. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 42
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API – node API
Every node in the DOM tree implements the Node interface
DocumentFragment
Document
Element
Attribute
CDATA
DocumentType
Notation
Entity
EntityReference
ProcessInstruction
Node
Text
Comment
CDATA Section
42
43. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 43
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API – node API
Every node has a specific location in tree
Node interface specifies methods for tree navigation
• Node getFirstChild();
• Node getLastChild();
• Node getNextSibling();
• Node getPreviousSibling();
• Node getParentNode();
• NodeList getChildNodes();
• NamedNodeMap getAttributes();
getParentNode()
getPreviousSibling()
getNextSibling()
getFirstChild()
getChildNodes()
getLastChild()
43
44. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 44
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API – node API
Every node has :
• a type
• a name
• a value
• attributes
The roles of these properties differ according to the node types
if (myNode.getNodeType() == Node.ELEMENT_NODE) {
//process node
…
}
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
Node-types:
44
45. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 45
Asia
3.7. Programming Models
3. XML and its Sub-Languages
2001
60424213
population
document
countries
country
continent name
Israel
population
year
6199008
city
capital
yes Jerusalem
name
city
capital
no Ashod
continent
Europe France 2004
namename
country
year
DOM tree
root node
node
value node
45
46. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 46
3. XML and its Sub-Languages
3.7. Programming Models
Example
46
// print one given city (city is given as element)
public static void printCity(Element city) {
Node nameNode = city.getElementsByTagName("name").item(0);
String cName = nameNode.getFirstChild().getNodeValue();
System.out.println("Found City: " + cName);
}
// prints all cities found in the DOM tree
public static void printCities(Document myXMLdoc) {
NodeList cities = myXMLdoc.getElementsByTagName("city");
for(int i = 0; i < cities.getLength(); ++i) {
printCity((Element)cities.item(i));
}
}
47. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 47
3. XML and its Sub-Languages
3.7. Programming Models
DOM Java API – node API
Children of a node in a DOM tree can be manipulated; added, edited, deleted, copied etc.
To construct new nodes, use the methods of Document: createElement, createAttribute,
createTextNode, createCDATASection etc.
To manipulate a node, use the methods of Node: appendChild, insertBefore, removeChild,
replaceChild, setNodeValue, cloneNode(boolean deep) etc.
insertBefore()
new
replaceChild()
new
cloneNode(false)
new
cloneNode(true)
new
47
48. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 48
3. XML and its Sub-Languages
Picture created by Harald Sack
48
49. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 49
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
Overview
XML Linking Language (XLink) is a XML markup language used for creating hyperlinks in XML
documents, e.g., HTML <a>-tag
XPointer is a system for addressing components of XML based internet media
XML Path Language (XPath) is a language for selecting nodes from an XML document
<a href="…">
<a name="…">
49
50. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 50
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XLink - general
Allows to create unidirectional links between exactly 2 resources
Origin of the link is always the starting document (where the link is nested)
Browsers are free to interpret this link as they like (depends on used CSS)
<a href="…">
Starting
document
Destination
document
Example as an implementation as HTML hyperlink, but can be more…
50
51. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 51
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XLink – parameters 1/3
title: (optional) textual information about the link (display in a browser as hint)
<publications
xmlns:xlink="http//www.w3.org/1999/xlink"
xlink:title="My publications"
xlink:href="http://www.linckels.lu/publications.txt"
xlink:role="http://www.dblp.de"
xlink:show="new"
xlink:actuate="onRequest"
xlink:type="simple"
/>
href: URI of linked destination resource (must not necessarily be a URL)
xlink: specifies that this is an XLink definition
role: (optional) points to a resource that specifies the meaning of the connection between
the resources, e.g., a Web page that gives further information
51
52. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 52
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XLink – parameters 2/3
show: (optional) context in which the linked resource is to display
- new: new window
- replace: current window
- embed: embed in current document
- other: customized
- none: no behavior
<publications
xmlns:xlink="http//www.w3.org/1999/xlink"
xlink:title="My publications"
xlink:href="http://www.linckels.lu/publications.txt"
xlink:role="http://www.dblp.de/"
xlink:show="new"
xlink:actuate="onRequest"
xlink:type="simple"
/>
actuate: (optional) specifies when an application that encounters an XLink should follow it
- onLoad: as soon as the application sees it
- onRequest: when the user asks to follow it
- other: customized
- none: no behavior (e.g., if the link is an ISBN number of a "physical book")
52
53. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 53
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XLink – parameters 3/3
type: specifies the type of the link
- simple: one standard link between two resources (i.e., HTML hyperlinks)
- extended: more links between a collection of resources (≈ directed graph)
1..n relation
E.g., a book is
published in three
particular editions
sequences
E.g., a book has
two preceding
versions
n..m relation
E.g., pizza is composed of different
ingredients and can result in different
compositions
Starting
documents
Destination
documents
Link basis
53
54. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 54
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XLink – Extended type example
<xlink:extended
xmlns:xlink="http//www.w3.org/1999/xlink/"
link:role="http://www.pizza.de/pizzaworld"
xlink:title="Pizza Tonno">
<xlink:locator
href="Pizzaboden.xml"
role="http://www.pizza.de/base"
title="Pizzaboden"/>
<xlink:locator
href="Basilikum.xml"
role="http://www.pizza.de/base"
title="Basilikum"/>
<xlink:arc
from="http://www.pizza.de/base"
to="http://www.pizza.de/special"
show="new"
actuate="onRequest"/>
</xlink:extended>
Pizzaboden Basilikum
Pizza Tonno
base
special
Specifies how the resources
relate to each other
Specifies the resources
involved
54
55. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 55
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XPointer and XPath - general
Fix an anchor inside a document, e.g., HTML: <a name="…">
Problems with HTML-anchors:
• not possible inside remote documents (no permission to modify the source code)
• the complete destination document must be transmitted, even if only one sub-part of the
document is addressed
Principle solution:
• represent documents as trees
• address sub-tree only (navigation through tree)
55
56. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 56
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XML Pointer Language (XPointer)
General definition of an XML Pointer:
URI#xpointer(anchor description)
<gedicht>
<strophe ID="strophe1">
<zeile ID="zeile1">Dreifach ist des Raumes Maß:</zeile>
<zeile ID="zeile2">Rastlos fort ohne Unterlaß</zeile>
<zeile ID="zeile3">Strebt die Länge fort ins Weite,</zeile>
<zeile ID="zeile4">Endlos gießet sich die Breite,</zeile>
<zeile ID="zeile5">Grundlos senkt die Tiefe sich</zeile>
</strophe>
</gedicht>
http://www.bsp.de/gedicht.xml#xpointer(ID('zeile2'))
http://www.bsp.de/gedicht.xml#xpointer(//zeile[@ID="zeile2"])
http://www.bsp.de/gedicht.xml#element(/1/1/2)
Example :
XML document
XPointers
56
57. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 57
3.8. XLink, XPath and XPointer
3. XML and its Sub-Languages
XML Path Language (XPath)
XPath is a non-XML language for identifying particular parts of XML documents, i.e., for
picking out nodes an sets of nodes out of a tree
gedicht.xml#xpointer(/child::gedicht[position()=3])
Example :
child descendant following following sibling
parent ancestor preceding preceding sibling
Context-nodeAddressed-node
axis context-node
predicate
57
58. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 58
3. XML and its Sub-Languages
Picture created by Harald Sack
58
59. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 59
3.9. XSL Transformations (XSLT)
3. XML and its Sub-Languages
XSLT overview
Extensible Stylesheet Language Transformations (XSLT) is an XML-based language used to
specify rules by which one XML document is transformed into another (XML) document
The resulting document may be XML syntax or another format, such as HTML or plain text
Examples of applications of XSLT:
- convert data between different XML schemas
- convert XML data into HTML or XHTML documents for web pages (e.g., with CSS)
- creating a dynamic web page
- convert into an intermediate XML format that can be converted to PDF documents
Such a transformation is based on the following languages:
- XSLT: specifies the transformation rules
- XSL-FO: describes how to transform layout
- XPath: access to specific parts of an XML document
59
60. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 60
3.9. XSL Transformations (XSLT)
3. XML and its Sub-Languages
XSLT transformation principle (example)
XML
documents
DTD/XML-S
XSL
document 1
XSL
document 2
PDF document
HTML
WML
XSLT processor
60
61. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 61
3.9. XSL Transformations (XSLT)
3. XML and its Sub-Languages
XSLT transformation principle (example)
XSL stylesheet
tree representation of XML
document
XSL works on the abstract tree representation of the XML document
A set of transformation rules are required in form a an XSLT document, e.g., template (XSL
stylesheet)
output
document
+
The structure tree is browsed and for each node the appropriate template from the XSL
stylesheet is applied
61
62. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 62
3.9. XSL Transformations (XSLT)
3. XML and its Sub-Languages
<data>
<row num="1">toto 1</row>
<row num="2">toto 2</row>
<row num="3">toto 3</row>
</data>
XML document
<?xml version="1.0" encoding="ISO-8859-1"?>
<html xsl:version="1.0"
xmlns:xsl="http://www.w3.org/
1999/XSL/Transform">
<body>
<h1>Demo</h1>
<xsl:for-each select="row">
<br/>Row:
<xsl:value-of select="@num"/>
- Data:
<xsl:value-of select="."/>
</xsl:for-each>
</body>
</html>
XSL stylesheet
<html><body>
<h1>Demo</h1>
<br/>Row: 1 - Data: toto 1
<br/>Row: 2 - Data: toto 2
<br/>Row: 3 - Data: toto 3
</body></html>
HTML document
XSLT
processor
62
63. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 63
3.9. XSL Transformations (XSLT)
3. XML and its Sub-Languages
import java.io.FileReader;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class TestTransformation {
public static void main(String[] args)
throws Exception {
Source sourceXSL = new StreamSource(new FileReader("stylesheet.xsl"));
Source sourceXML = new StreamSource(new FileReader("data.xml"));
TransformerFactory trFac = TransformerFactory.newInstance();
Transformer tf;
Result resultOnScreen = new StreamResult(System.out);
tf = trFac.newTransformer(sourceXSL);
tf.transform(sourceXML, resultOnScreen);
}
}
Perform XSL transformation using a Java programme
63
64. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 64
3.9. XSL Transformations (XSLT)
3. XML and its Sub-Languages
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl" ?>
<person type="Teacher">
<name>Serge Linckels</name>
<hp>http://www.linckels.lu</hp>
<size>173</size>
<phone>691-111111</phone>
</person>
Perform XSL transformation using a stylesheet
64
65. Semantic Web ::: Serge Linckels ::: www.linckels.lu ::: serge@linckels.lu ::: 65
XML in a Nutshell
Elliotte R. Harold, W. Scott Means, W. Scott Means
3. XML and its Sub-Languages
3.10. References
65
E-Librarian Service
User-Friendly Semantic Search in Digital Libraries
Serge Linckels, Christoph Meinel