BBA 205 BE UNIT 2 economic systems prof dr kanchan.pptx
XML and XML Applications - Lecture 04 - Web Information Systems (WE-DINF-11912)
1. 2 December 2005
Web Information Systems
XML and XML Applications
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://www.beatsigner.com
2. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2
What is XML?
Standardised text format for (semi-)structured
information
Meta markup language
tool for defining other markup languages
- e.g. XHTML, WML, VoiceXML, SVG, Office Open XML (OOXML)
Data surrounded by text markup that describes the data
ordered labeled tree
<note date="2013-10-17">
<to>Reinout Roels</to>
<from>Beat Signer</from>
<content>Let us discuss exercise 4 this afternoon ...</content>
</note>
3. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3
... and What is it Not?
XML is not a programming language
however, it can be used to represent program
instructions, configuration files etc.
note that there is an XML application (XSLT) which is
Turing complete
XML is not a database
XML is often used to store long-term data but it lacks many
database features
many existing databases offer an XML import/export
more recently there exist specific XML databases
- e.g. Tamino by Software AG
4. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4
XML Example
<?xml version="1.0"?>
<publications>
<publication type="inproceedings">
<title>An Architecture for Open Cross-Media Annotation Services</title>
<author>
<surname>Signer</surname>
<forename>Beat</forename>
</author>
<author>
<surname>Norrie</surname>
<forename>Moira</forename>
</author>
<howpublished>Proceedings of WISE 2009</howpublished>
<month>10</month>
<year>2009</year>
</publication>
<publication type="article">
...
</publications>
5. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5
Evolution of XML
Descendant of Standard Generalized Markup
Language (SGML)
SGML is more powerful but (too) complex
HTML is an SGML application
XML was developed as a “SGML-Lite” version
XML 1.0 published in February 1998
Since the initial XML release numerous associated
standards have been published
6. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6
Why has XML been so Successful?
Simple
General
Accepted
Many associated standards
Many (freely) available tools
7. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7
XML Specification
Provides a grammar for XML documents in terms of
placement of tags
legal element names
how attributes are attached to elements
...
General tools
parsers that can parse all XML documents regardless of particular
application tags
editors and various programming APIs
Specification available at http://www.w3.org/TR/xml/
8. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8
XML Tree Document Structure
An XML document tree can contain 7 types of nodes
root node
- always exactly one root node
element nodes
- element node with optional attribute nodes
attribute nodes
- name/value pairs
text nodes
- text belonging to an element or attribute
comment nodes
processing instruction nodes
- pass information to a specific application via <? ... ?>
namespace nodes
9. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9
Well-Formedness and Validity
An XML document is well-formed if it follows
the rules of the XML specification
An XML document can be valid according to its
Document Type Definition (DTD) or XML Schema
completely self-describing about its structure and content through
- the document content
- auxiliary files referred to in the document
validity can be checked by a validating XML parser
- online validation service available at http://validator.w3.org
<ELEMENT publication (title, author+ howpublished?, month, year)>
<ELEMENT title (#PCDATA)>
<ELEMENT author (surname, forename)>
<ATTLIST publication type CDATA>
…
10. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10
Differences Between XML and HTML
XML is a tool for specifying markup languages rather
than a markup language itself
specify “special markup languages for special applications”
XML is not a presentation language
defines content rather than presentation
HTML mixes content, structure and presentation
XML was designed to support a number of applications
and not just web browsing
XML documents should be well-formed and valid
XML documents are easier to process by a program
11. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11
Differences Between XML and HTML ...
Readability is more important than conciseness
e.g. <tablerow> rather than <tr>
Matching of tags is case sensitive
e.g. start tag <Bold> does not match end tag </BOLD>
Markup requires matching start and end tags
e.g. <p> and </p>
exceptions are special non-enclosing tags
e.g. <br/> or <image ... />
Whitespaces in texts are significant
12. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12
XHTML
XHTML is a reformulation of HTML to make
it an XML application
we accept that HTML is here to stay
improve HTML it by using XML with minimal effort
W3C stopped their work on XHTML
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Vrije Universiteit Brussel</title>
</head>
<body>
...
</body>
</html>
13. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13
Differences Between XHTML and HTML
Documents must be valid
XHTML namespace must be declared in <html> element
<head> and <body> elements cannot be omitted
<title> element must be the first element in the <head>
End tags are required for non-empty clauses
Empty elements must consist of a start-tag and end-tag
pair or an empty element (e.g. <br/>)
Element and attribute names must be in lowercase
Attribute values must always be quoted
Attribute names cannot be used without a value
14. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14
XML Technologies
XLink XPointer
XPath
XQuery
XSLT
15. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15
Overview of XML Technologies
XPath and XPointer
addressing of XML elements and parts of elements
XSL (Extensible Stylesheet Language)
transforming XML documents (XSLT) and XSL:FO
XLink (XML Linking Language)
linking in XML
XQuery (XML Query Language)
querying XML documents
Document Type Definition (DTD) and XML Schema
definition of schemas for XML documents
DTDs have a very limited expressive power
XML Schema introduces datatypes, inheritance etc.
16. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16
Overview of XML Technologies ...
SAX (Simple API for XML)
event-based programming API for reading XML documents
DOM (Document Object Model)
programming API to access and manipulate XML documents as
tree structures
RDF (Resource Description Framework)
specific XML encoding used by the semantic web
17. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17
Document Object Model (DOM)
Defines a language neutral API for accessing and
manipulating XML documents as a tree structure
have already seen the HTML DOM model
The entire document must be read and parsed before it
can be used by a DOM application
DOM parser not suited for large documents!
Two different types of DOM Core interfaces for
accessing supported content types
generic Node interface
node type-specific interfaces
Various available DOM parsers
e.g. JDOM parser specifically for Java
18. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18
Document Object Model (DOM) ...
Different DOM levels
DOM Level 1
- concentrates on HTML and XML document models
- contains functionality for document navigation and manipulation
DOM Level 2
- supports XML Namespaces
- stylesheet object model and operations to manipulate it
DOM Level 3
- specifies content models (DTD and Schemas)
19. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19
XPath
Expression language to address elements of an XML
document (used in XPointer, XSLT and XQuery)
A location path is a sequence of location steps separated
by a slash (/)
various navigation axes such as child, parent, following etc.
have a look at our XSLT/XPath reference document that is
available on PointCarré for all the details about XPath
XPath expressions look similar to file pathnames
/publications/publication
/publications/publication[year>2008]/title
//author[3]
20. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20
XML Pointer Language (XPointer)
Address points or ranges in an XML document
Uses XPath expressions
Introduces addressing relative to elements
supports links to points without anchors
21. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21
XML Linking Language (XLink)
Standard way for creating links in XML documents
Fixes limitations of HTML links where
anchors must be placed within documents
only entire documents or predefined marks (#) can be linked
only one-to-one unidirectional links are supported
XLinks can be defined in separate documents
third-party link (metadata) server
Two types of links
simple links
- associate exactly one local and one remote resource (similar to HTML links)
extended links
- associate an arbitrary number of resources
22. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22
XML Linking Language (XLink) ...
Other XLink features
linking parts of resources
links can be defined at
the attribute level
typed links
The Annotea project
uses XLink for managing
external annotations
for example used in the
Amaya Web Browser
Annotation in the Amaya Browser
23. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23
Simple API for XML (SAX)
Event-based API for XML document parsing
many free SAX parsers available (e.g. Apache Xerces)
Scans the document from start to end
invokes callback methods
Different kinds of events
start of document
end of document
start tag of an element
end tag of an element
character data
processing instruction
SAX parser needs less memory than DOM parser
DOM parser often uses SAX parser to build the DOM tree
24. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24
XML Transformations
Developers want to be able to transform data from one
format to another
processing of XML documents
- XML to XML transformation
post-processing of documents
- e.g. XML to XHTML, XML to WML, XML to PDF, ...
The Extensible Stylesheet Language Transformations
(XSLT) language can be used for that purpose
25. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25
XSLT Processor
The XSLT processor (e.g. Xalan) applies an XSLT stylesheet to an
XML document and produces the corresponding output document
DTD
Source Tree Result Tree
Stylesheet Tree
DTD
XSLT Stylesheet
XML Document XHTML, WML, ...
DOM
Parser
XSLT
Processor
Input Document Output Document
26. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26
XSL Transformations (XSLT)
Most important part of XSL
uses XPath for the navigation
XSLT is an expression-based language based on
functional programming concepts
XSLT uses
pattern matching to select parts of documents
templates to perform transformations
Most web browsers support XSLT
transformation can be done on the client side based on an XML
document and an associated XSLT document
27. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27
Example
<?xml version="1.0"?>
<publications>
<publication type="inproceedings">
<title>An Architecture for Open Cross-Media Annotation Services</title>
<author>
<surname>Signer</surname>
<forename>Beat</forename>
</author>
<author>
<surname>Norrie</surname>
<forename>Moira</forename>
</author>
<howpublished>Proceedings of WISE 2009</howpublished>
<month>10</month>
<year>2009</year>
</publication>
<publication type="article">
...
</publications>
29. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29
Other XSLT Statements
<xsl:for-each select="...">
select every XML element of a specified node-set
<xsl:if test="...">
conditional test
<xsl:sort select="..."/>
sort the output
...
Have a look at the XSLT/XPath reference document that
is available on PointCarré
in exercise 4 you will have the chance to implement and execute
different XSLT transformations
30. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30
XML for Data Interchange
Standard representation to exchange information
between different systems
General way to query data from different systems
e.g. via the XML Query (XQuery) language
Connect applications running on different operating
systems and computers with different architectures
XML Remote Procedure Call (XML-RPC)
Simple Object Access Protocol (SOAP) which is a successor
of XML-RPC and used for accessing Big Web Services
- discussed later in the course
31. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31
XML Remote Procedure Call (XML-RPC)
XML-RPC specification released in April 1998
Advantages
XML-based lingua franca understood by different applications
HTTP as carrier protocol
not tied to a single object model (as for example in CORBA)
easy to implement (based on HTTP and XML standards)
lightweight protocol
built-in error handling
Disadvantages
slower than specialised protocols that are used in closed
networks
32. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32
XML-RPC Request and Response
POST /RPC2 HTTP/1.0
User-Agent: Java1.2
Host: macrae.vub.ac.be
Content-Type: text/xml;charset=UTF-8
Content-length: 245
<?xml version="1.0" encoding="ISO-8859-1"?>
<methodCall>
<methodName>Math.multiply</methodName>
<params>
<param>
<value><double>128.0</double></value>
</param>
<param>
<value><double>256.0</double></value>
</param>
</params>
</methodCall>
HTTP/1.1 200 OK
Connection: close
Content-Length: 159
Content-Type: text/xml
Server: macbain.vub.ac.be
<?xml version="1.0" encoding="ISO-8859-1"?>
<methodResponse>
<params>
<param>
<value><double>32768.0</double></value>
</param>
</params>
</methodResponse>
XML-RPC Request XML-RPC Response
33. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33
XML-RPC Error Message
HTTP/1.1 200 OK
Connection: close
Content-Length: 159
Content-Type: text/xml
Server: macbain.vub.ac.be
<?xml version="1.0" encoding="ISO-8859-1"?>
<methodResponse>
<fault>
<value>
<struct>
<member>
<name>faultCode</name>
<value><int>873</int></value>
</member>
<member>
<name>faultString</name>
<value><string>Error message</string></value>
</member>
</struct>
</value>
</fault>
</methodResponse>
XML-RPC Response
34. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34
XML-RPC Scalar Values
XML-Tag Type Corresponding Java Type
<i4> or <int> four-byte signed integer Integer
<boolean> 0 or 1 Boolean
<string> ASCII string String
<double> double-precision signed float Double
<dateTime.iso8601> date/time Date
<base64> base64-encoded binary byte[]
35. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35
XML-RPC Composed Values
Complex data types can be represented by nested
<struct> and <array> structures
XML-Tag Type Corresponding Java Type
<struct> A structure contains
<member> elements and
each member contains a
<name> and a <value>
element
Hashtable
<array> An array contains a single
<data> element which can
contain any number of
<value> elements
Vector
36. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36
OMX-FS
XML-RPC Example: GOMES
Object-Oriented GUI for
the Object Model Multi-
User Extended Filesystem
GOMES is implemented in
Java and uses XML-RPC
to communicate with the
Object Model Multi-user
Extended File System
(OMX-FS) which was im-plemented
in the Oberon
programming language
XML-RPC
37. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37
Framework for Universal Client Access
Generic database interface instead of developing a new
interface from scratch for each new device type
The presented eXtensible Information Management
Architecture (XIMA) is based on
OMS Java object database
- managing the application data
Java Servlet Technology
generic XML database interface
- separation of content and representation
XSLT
- appropriate XSLT stylesheet chosen based on User-Agent HTTP header field
38. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38
XIMA Architecture
OMS Java Workspace
OMS Java API
XML Server
HTML Servlet WML Servlet VXML Servlet
HTML
Browser
WML
Browser
VXML
Browser
Delegation
Builds XML
based on JDOM
XML + XSLT
→ Response
OM Model
Collections, Associations,
multiple inheritance and
multiple instantiation
Main Entry Servlet
39. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39
Generic XIMA Interfaces
XHTML Interface WML Interface
40. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40
Voice Interfaces
Trend for ubiquitous information services
small screens, keyboards etc. often clumsy to use
Sometimes it is necessary to have hand-free interfaces
e.g. while driving or operating a machine
Alternative input modality for visually impaired users
Voice interfaces can be accessed by a regular phone
no new device is required
no installation effort
Improvements in speech recognition and text-to-speech
synthesis make automatic voice interfaces more feasible
e.g. for call centers
41. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41
VoiceXML Architecture
Various solutions
development: IBM WebSphere Voice Server SDK
deployment: BeVocal Cafe Voice Portal
Speech
Recogniser
Converts voice
input into text
Speech model
Language
Analyser
Extracts meaning
from text
Grammar
Application
Server
Gets data (text)
from database
Application
database
Speech
Synthesiser
Generates
speech output
Pronounciation
rules
Text Meaning Text
Voice Input Voice Output
Speech Speech
42. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42
VoiceXML Architecture (for XIMA)
XIMA Framework
Apache
Web Server
Tomcat
OMS Java
Database
Websphere Voice
Server SDK
BeVocal
Voice Portal
43. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43
Basic VoiceXML Concepts
Dialogue
conversational state in a form or menu
form
- interaction that collects values for field item variables
menu
- presents user with a choice of options
- transition to next dialogue based on choice
Input
recognition of spoken input (or recording of spoken input)
recognition of DTMF (dual-tone multi-frequency) input
Output
speech synthesis (TTS)
recorded audio files
44. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44
VoiceXML Form Example
<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/vxml
http://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0">
<form id="drinkForm">
<field name="drink">
<prompt>Would you like to order beer, wine, whisky, or nothing?</prompt>
<grammar src="drinks.grxml" type="application/srgs+xml"/>
</field>
<block>
<submit next="http://www.wise.vub.ac.be/drinks.php"/>
</block>
</form>
</vxml>
45. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45
VoiceXML Menu Example
<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/vxml
http://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0">
<menu id="mainMenu">
<prompt>
This is the main menu. What would you like to order? <enumerate/>
</prompt>
<choice next="#foodForm">food</choice>
<choice next="#drinkForm">drink</choice>
</menu>
...
</vxml>
46. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46
collections associations objects
The database contains #Collections and #Associations
Would you like to go to the collections, to the associations,
directly to an object or back to the main menu?
The database contains the
following # associations
Choose an association
Association 'name' contains #A
Would you like to list the
members or go back?
Association 'name' contains the
following # associations
Choose a 'domaintype' or
a 'rangetype' or say back
Object 'oID' is dressed with type 'type' and currently viewed as type 'type'. It contains #Attr, #Links, and #Methods
Choose a link
or say back
The object contains the
following # attributes
Would you like to hear the attributes, the links or
the methods or go back?
You can choose among
the following links
You can choose among
the following methods
You can view the object
as the following types
The database contains the
following # collections
Choose a collection
Collection 'name' contains #M
Would you like to list the
members or go back?
Collection 'name' contains the
following # members
Choose one of the members
The database contains #Objects
Choose an object or say back
Choose a method
or say back
Choose one of the
types or say back
The result of the
method is Result
47. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 47
Example: Avalanche Forecasting System
Project to provide WAP
and voice access
48. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 48
Other XML Applications
Synchronized Multimedia Integration Language (SMIL)
animations (timing, transitions etc.)
Mathematical Markup Language (MathML)
mathematical notations (content and structure)
Scalable Vector Graphics (SVG)
two-dimensional vector graphics (static or dynamic)
Ink Markup Language (InkML )
digital ink representation (e.g. from digital pen)
Note that XML standards can also be combined
e.g. XHTML+Voice Profile 1.0
49. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 49
Other XML Applications …
Office Open XML (OOXML)
file format (ZIP) for representing word processing documents,
presentations etc. (e.g. *.docx, *.pptx and *.xlsx)
- various XML files within these ZIP documents
- specific markup languages for different domains (wordprocessingML,
presentationML, spreadsheetML, …)
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
... <a:p>
<a:r><a:rPr lang="en-GB" dirty="0" smtClean="0" />
<a:t>Other XML</a:t>
</a:r>
<a:r><a:rPr lang="en-GB" dirty="0" smtClean="0" />
<a:t>Applications ...</a:t>
</a:r>
<a:endParaRPr lang="en-GB" dirty="0" />
</a:p> ...
</p:sld> single slide from a pptx file
50. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 50
Exercise 4
XML and XSLT transformations
51. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 51
References
Elliotte Rusty Harold and W. Scott Means,
XML in a Nutshell, O'Reilly Media, September 2004
XML and XML Technology Tutorials
http://www.w3schools.com
Masoud Kalali, Using XML in Java
http://refcardz.dzone.com/refcardz/using-xml-java
VoiceXML Version 2.0
http://www.w3.org/TR/voicexml20/
VoiceXML Version 2.0
http://www.w3.org/TR/voicexml20/
52. October 17, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 52
References ...
Amaya Web Browser
http://www.w3.org/Amaya/
XML-RPC Homepage
http://www.xmlrpc.com
B. Signer et al., Aural Interfaces to Databases based on
VoiceXML, Proc. of VDB6, Brisbane, Australia, 2002
http://www.academia.edu/175464/Aural_Interfaces_to_Da
tabases_based_on_VoiceXML
eXtensible Information Management Architecture (XIMA)
http://www.beatsigner.com/xima.html