This document provides an overview of XML, XSL, and Java technologies for working with XML. It discusses XML syntax and structure, validation, namespaces, DTDs and XML Schema for validation. It also covers XPath for querying XML, XSLT for transforming XML, and Java APIs including JAXP, JDOM, DOM4J, and JAXB for processing XML using Java.
1. Java course - IAG0040
Java and the Web:
XML, XSL, Servlets
Anton Keks 2011
2. Introduction to XML
●
XML = Extensible Markup Language
– recommended by W3C general-purpose markup language
– includes text and extra information (markup)
– “simplified SGML”
– meta-language, can be used to create new ones
●
XML has hit the “sweet spot” between simplicity and
flexibility
– very widely used for exchange of various data
– even HTML has been retrofitted as XHTML
– MathML, MusicXML, SVG, WSDL, RSS, OpenDocument, etc
Java course – IAG0040 Lecture 13
Anton Keks Slide 2
3. XML design goals
●
Human-readable
– human-readable and self-descriptive markup
– text files, supports Unicode
●
Easily machine-parseable
– strict structure, well-defined formal rules
– well-compressible for storage and transmission
– platform-independent
● Multi-purpose and extensible
– hierarchical structure: records, lists, trees
– schemas, namespaces
Java course – IAG0040 Lecture 13
Anton Keks Slide 3
4. XML syntax
●
Single element
– <name attribute="value">content</name>
● Example document
– <?xml version="1.0" encoding="UTF-8"?>
<recipe name="bread" prepTime="5 mins" cookTime="3 hours">
<title>Basic bread</title>
<ingredient amount="3" unit="cups">Flour</ingredient>
<ingredient amount="0.25" unit="ounce">Yeast</ingredient>
<ingredient amount="1.5" unit="cups" state="warm">Water</ingredient>
<ingredient amount="1" unit="teaspoon">Salt</ingredient>
<instructions>
<step>Mix all ingredients together, and knead thoroughly.</step>
<step>Cover with a cloth, and leave for one hour in warm room.</step>
<step>Knead again, place in a tin, and then bake in the oven.</step>
</instructions>
</recipe>
Java course – IAG0040 Lecture 13
Anton Keks Slide 4
5. XML Structure
● XML Declaration (version, encoding, external dependencies)
– <?xml version="1.0" standalone="yes" encoding="UTF-8"?>
● Document type definitions (DTD): <!DOCTYPE example [ ... ]>
● Single root element, nested elements, some with attributes and content
– <name attribute="value">content</name> or <foo/>
– starting and ending tag, content or nested elements between, no overlapping
– case-sensitive
● Special chars and entities
– predefined: & < > ' " &#DDD; &#xHH;
– more can be declared: <!ENTITY copy "©">
– unescaped data: <![CDATA[ A & B ]]>
● Comments: <!-- Hello -->
Java course – IAG0040 Lecture 13
Anton Keks Slide 5
6. XML correctness
●
Well-formed
– conforms to all syntax rules
●
Valid (only if well-formed)
– data and structure conforms to a set of rules,
describing correct data values and locations
– must comply to a schema
– DTD – a part of XML spec
– More functional: XML Schema (XSD), RELAX NG
Java course – IAG0040 Lecture 13
Anton Keks Slide 6
7. DTD example
● DTD = Document Type Definition
●
Declaration
– <!DOCTYPE customer [ element declarations here ]> - internal DTD
– <!DOCTYPE customer SYSTEM "customer.dtd"> - external DTD
– <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
● Content
– <!ELEMENT people_list (person*)>
<!ELEMENT person (name, birthdate?, gender?, personal_id?)>
<!ATTLIST person index CDATA #REQUIRED>
<!ELEMENT name (#PCDATA)>
<!ELEMENT birthdate (#PCDATA)>
<!ELEMENT gender (#PCDATA)>
<!ELEMENT personal_id (#PCDATA)>
Java course – IAG0040 Lecture 13
Anton Keks Slide 7
8. XML Namespaces
●
Help to avoid naming conflicts
●
Allow merging of XML documents with different semantics
●
Uses prefixes to distinguish namespaces
– <xhtml:table><xhtml:tr/></xhtml:table>
– prefix names are not fixed, defined in declaration
● xmlns:prefix=”namespaceURI”
● <h:table
xmlns:h=”http://www.w3.org/TR/html4/”>
– default namespace can be declared with xmlns alone
● <table xmlns=”http://www.w3.org/TR/html4/”>
Java course – IAG0040 Lecture 13
Anton Keks Slide 8
9. XSD: W3C XML Schema
● XML-based
●
Has more features than DTD
● Namespaces are directly supported
●
Data model
– the vocabulary
● element and attribute names
– the content model
● relationships, structure, ordering
– the data types
● semantics and validation rules
Java course – IAG0040 Lecture 13
Anton Keks Slide 9
11. Unit testing
●
XMLUnit is a 3rd party addition to JUnit
– was designed for JUnit 3.x, however perfectly usable
with JUnit 4
– provides XMLAssert class that can be statically
imported
● import static org.custommonkey.xmlunit.XMLAssert.*;
● assertXXX() methods take XML as String or Document
– simplifies code testing that works with XML
● XML equality and similarity checking
● Validation
● XPath evaluation and checking
● Transformation
Java course – IAG0040 Lecture 13
Anton Keks Slide 11
12. XPath
● XPath is a language for finding information in an XML document
– uses path expressions to select nodes (elements, attributes)
– has a library of built-in functions
– XML documents are treated as trees of nodes
● Sample XPath expressions
– /bookstore/book – all book elements under bookstore
– //book – all book elements in the document
– @lang – the value of lang attribute of current element
– bookstore/book[price > 35.00] – all books costing more than 35
– //book[@lang='en'] – all books in English
– book[1]/author[1]/name – first author of the first book
– book[last() - 1] – the book before the last one
Java course – IAG0040 Lecture 13
Anton Keks Slide 12
13. Introduction to XSL
●
Meaning of arbitrary XML tags is not well understood by
e.g. a web browser
●
XSL describes how the XML document should be displayed
●
XSL = Extensible Stylesheet Language
– XML based, again
●
XSLT = XSL Transformations
– can be used to transform one XML format to another
XML or other text format (very often HTML or XHTML)
●
XSL-FO – a language for formatting XML documents (to
produce, e.g. PDF documents, images, graphics, etc)
Java course – IAG0040 Lecture 13
Anton Keks Slide 13
14. XSLT basics
● Assigning stylesheets
– <?xml-stylesheet type="text/xsl" href="file.xsl"?>
● XSL stylesheet
– <xsl:stylesheet version=”1.0”
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
– then, one or more templates are defined
<xsl:template match=”/”>
<contents><xsl:copy-of select=”.”/></contents>
</xsl:template>
– Various xsl elements are used for querying of data using XPath
● all match, select, and test attributes take XPath expressions
● in other attributes, you can put XPath into {}
– Most used elements: copy-of, value-of, for-each, sort,
if, choose, apply-templates, call-template
Java course – IAG0040 Lecture 13
Anton Keks Slide 14
15. XML Parsing
● There are 3 different ways to work with XML
– DOM = Document Object Model
● stores the full XML tree in memory as objects
● convenient to work with, but not suitable for very large XMLs
– SAX = Streaming API for XML
● reads (streams) XML data and produces events
● no access to the full document, state must be maintained manually
● no limits on XML size, generally faster
– XPP, XML Pull Parser (SAX is a 'push' parser)
● non-standard, not bundled with Java
● does not produce events, but rather waits for the client program to
'pull' information about parsing, then continues processing
Java course – IAG0040 Lecture 13
Anton Keks Slide 15
16. Introduction to JAXP
●
JAXP = Java API for XML Processing
– javax.xml.parsers
● DocumentBuilder – DOM, SAXParser – SAX
– javax.xml.xpath
● XPath – compilation and evaluation of XPath expressions
– javax.xml.transform
●
Transformer – XSLT
– JAXP defines only interfaces, implementations are pluggable
● access to the implementations is via Factories
– DocumentBuilderFactory, SAXParserFactory, etc
● Java 1.6 bundles Apache Xerces and Xalan
Java course – IAG0040 Lecture 13
Anton Keks Slide 16
17. JAXP overview
●
Other parts of the API are in packages,
according to standards that define them
– DOM is in org.w3c.dom
● Document, Node, Element, Attr, etc interfaces for
storing of DOM trees
– SAX is in org.xml.sax
●
XMLReader, ContentHandler interfaces for
handling/producing of SAX events
Java course – IAG0040 Lecture 13
Anton Keks Slide 17
18. JAXP and DOM
●
javax.xml.parsers.DocumentBuilderFactory - creates DocumentBuilder
instances. Used to set various attributes for the parser, including its
validating behavior.
● javax.xml.parsers.DocumentBuilder - performs parsing and creates DOM
Documents representing parsed XML
● org.w3c.dom.Document - represents the root of the XML DOM tree. An
element that contains the elements of the document.
● org.w3c.dom.Node - a single node in the document tree. A node can be
an element, an attribute, an entity, a document, or a text node.
●
org.w3c.dom.NodeList - an ordered enumeration of nodes
● org.w3c.dom.Element - a Node representing an XML element
● org.w3c.dom.Attr - an attribute attached to an Element
● org.w3c.dom.Text - a text Node (content of an element), CharacterData
Java course – IAG0040 Lecture 13
Anton Keks Slide 18
19. JAXP and SAX
●
org.xml.sax.SAXParserFactory - creates SAXParser instances. Allows
various parameters to be set for the creation of the parser.
● javax.xml.parsers.SAXParser - used to initiate parsing of XML documents.
Encapsulates an XMLReader for generation of SAX events.
● org.xml.sax.XMLReader - used to register event handlers. Calls the
callback methods as content being scanned (generates SAX events)
● org.xml.sax.ContentHandler - the interface to implement in order to
receive SAX events. Instance must be registered with XMLReader.
● org.xml.sax.ErrorHandler – the interface to implement in order to handle
parsing errors.
● org.xml.sax.helpers.DefaultHandler - default implementation of
ContentHandler, ErrorHandler and a couple of other interfaces; can be
extended to simplify SAX event handling.
Java course – IAG0040 Lecture 13
Anton Keks Slide 19
20. JAXP and XSL
●
javax.xml.transform.TransformerFactory - creates Transformer
instances, either simple, that just copies source to the result, or with an
associated stylesheet that does the actual transformation
● javax.xml.transform.Transformer - represents the transformation rules
(stylesheet); used to transform the source XML and write the result
● javax.xml.transform.Source - interface for sources of transformation.
Used to provide both the stylesheet and the XML to the Transformer.
– Implementations: DOMSource, SAXSource, StreamSource, etc
● javax.xml.transform.Result - interface for writing of transformation
result.
– Implementations: DOMResult, SAXResult, StreamResult, etc
● javax.xml.transform.ErrorListener – interface for customized error
handling
Java course – IAG0040 Lecture 13
Anton Keks Slide 20
21. JAXP and XPath
●
javax.xml.xpath.XPathFactory - creates XPath instances and can be used
to define custom XPathFunctionResolver and XPathVariableResolver
● javax.xml.xpath.XPath - XPath evaluation environment. Used to compile
and evaluate XPath expressions. Evaluation takes the context node as a
parameter to evaluate the expression on.
● javax.xml.xpath.XPathExpression - compiled XPath expression, used
directly for multiple evaluations of same expressions.
● javax.xml.xpath.XPathConstants - a mapping between XPath and Java
data types
Java course – IAG0040 Lecture 13
Anton Keks Slide 21
22. JAXB
●
JAXB = Java API for XML Binding
– XML serialization of Java objects
– javax.xml.bind
– Involves generation of Java classes according to
the XML schema or vice-versa
– JAXBContext is a factory for Marshaller and
Unmarshaller
Java course – IAG0040 Lecture 13
Anton Keks Slide 22
23. JDOM & DOM4J
●
org.w3c.dom API was designed for any OO language and
was mapped to Java more or less directly
– the resulting API is not very convenient for Java
●
Two similar 3rd party DOM APIs address this
– JDOM is more lightweight and was proposed for
inclusion in Java SE
– DOM4J has integrated support for XPath, provides
better interoperability with W3C DOM and Transformer
– Most operations can be done using single method calls
– Java Strings and Collections are used
Java course – IAG0040 Lecture 13
Anton Keks Slide 23
24. XML generation
●
There are many options:
– String concatenation
●
inflexible, can easily produce broken XML
– Programmatic creation of DOM tree
– Manual generation of SAX events
– JDOM/DOM4J
– XML marshalling using JAXB or similar API
– Template engines, e.g. StringTemplate, Velocity
● basically pre-created XML files with 'holes' that can be
filled with data
Java course – IAG0040 Lecture 13
Anton Keks Slide 24
25. Servlets
● Servlets are server-side Java applications
● Now javax.servlet API is officially a part of Java EE
● They process asynchronous requests and generate responses
● Servlets are most often used in Web applications
● Servlets are deployed and run within containers (web
application servers)
– there are many commercial application servers
– Jetty and Tomcat are open-source ones
●
JSP (Java Server Pages) are PHP/ASP-like Java files with
embedded HTML, but they must be compiled into servlets
(usually on-the-fly)
Java course – IAG0040 Lecture 13
Anton Keks Slide 25
26. Servlet API
●
A Servlet must implement javax.servlet.Servlet interface. However,
most servlets extend either javax.servlet.GenericServlet or
javax.servlet.http.HttpServlet.
● A container creates a single instance of the servlet class using the
default constructor, then it calls the init() method
● On every client request, the service() method is called
– for HTTP, there are various higher-level methods defined, e.g.
doGet(), doPost(), doPut(), doDelete(), etc
– these methods must be thread-safe because they are executed
concurrently. javax.servlet.SingleThreadModel interface can tell
the container not to do it.
– all these methods take HttpServletRequest and
HttpServletResponse as parameters
Java course – IAG0040 Lecture 13
Anton Keks Slide 26
27. HttpServletRequest
● HttpServletRequest is for reading user's input
– getParameter() is for reading of HTTP request parameters
– getHeader() is for reading HTTP headers
– getCookies() is for examining the available cookies
– getSession() creates/obtains the HTTP session
– getReader() / getInputStream() are for reading of large request
payloads (e.g. uploaded files)
– getLocalXXX() / getServerXXX() return various info about the host,
where servlet is running and the server itself
– getRemoteXXX() returns various info on the remote client
– various other methods provide even more information
Java course – IAG0040 Lecture 13
Anton Keks Slide 27
28. HttpServletResponse
● HttpServletResponse is for generating the response to the user
– addCookie() adds a cookie to the response
– addHeader() adds an arbitrary HTTP header to the response
– getWriter() / getOutputStream() provide a stream for writing of
response content, not further header modifications are possible if
isCommitted() returns true
– sendError() / setStatus() is for setting response status codes
– setContentLength() sets the size in bytes of outputted content
– setContentType() sets the MIME type of outputted content
(text/html for HTML content)
– There are a lot of SC_XXX status code constants defined
– There are many other useful methods
Java course – IAG0040 Lecture 13
Anton Keks Slide 28
29. Sessions
● Sessions are used to persist some information (state) about the client
between asynchronous requests
● Provided by HttpSession interface
– request.getSession() returns an instance
– session attributes are any Objects with String keys, they are persisted
until session is either invalidate()'d or expired (after 30 min by
default)
● Servlet container uses either cookies or URL-rewriting to pass/retrieve
the session ID
– response.encodeURL() must be used with any output URLs for
URL-rewriting to work, in case cookies are not available
– these URLs typically look like this:
http://host/servlet;jsessionid=72183CAFE23?abc=hello
Java course – IAG0040 Lecture 13
Anton Keks Slide 29
30. Servlet Filters
● Filters can be used to pre- or post-process requests
– Called in chain, one after another before the servlet, like
decorator pattern
– Can be used for access control, logging, context
initialization, compression, etc
● Need to implement javax.servlet.Filter
– Method doFilter(request, response, chain)
– To delegate processing further down the chain (optional), call
chain.doFilter(request, response)
– Or requests can be processed directly just like in a servlet
Java course – IAG0040 Lecture 13
Anton Keks Slide 30
31. Deployment
● Web applications typically have defined directory structure
– The root of the application is the document root, e.g. where
images and other static content is located
– There is a WEB-INF directory. Files contained there are hidden
from direct access
● web.xml – deployment descriptor, defines URL-
patterns, deployed servlets, various parameters, etc
● classes – directory with compiled .class files
● lib – directory with .jar files (all are automatically
loaded)
●
Another possibility is to put the same things into a single .war
(Web ARchive) file, which is in the same format as .jar
Java course – IAG0040 Lecture 13
Anton Keks Slide 31
33. Apache Digester
●
Is a 3rd party jar for reading of XML
– stores data directly in Java domain object tree (not DOM), e.g. Customer,
Order
– similar to unmarshalling; stack-based approach
– rules can be created either programmatically or put into an XML file
● Example
– Digester digester = new Digester();
digester.push(this);
digester.addObjectCreate(“customers/customer”, Customer.class);
digester.setProperties(“customers/customer”);
digester.addSetNext(“customers/customer”, “addCustomer”,
Customer.class.getName());
digester.addCallMethod(“customers/customer/address”, “setAddress”, 0);
digester.parse(“customers.xml”);
Java course – IAG0040 Lecture 13
Anton Keks Slide 33
34. More info
●
Good source of information and tutorials about
all W3, XML and related technologies
– http://www.w3schools.com/
Java course – IAG0040 Lecture 13
Anton Keks Slide 34