Java and XML
(DOM, SAX, JDOM)

Raji GHAWI

20/01/2009
Outlines
1.
2.
3.

DOM
SAX
JDOM

20/01/2009

2
1.

DOM

Document Object Model
<inventory>
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<book year="1995">
<title>Zodiac</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553573862</isbn>
<price>7.50</price>
</book>
<!-- more books... -->
</inventory>

20/01/2009

4
Import required packages

import javax.xml.parsers.*;
import org.w3c.dom.*;

20/01/2009

5
Create the parser
DOM parser factory
try {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

DocumentBuilder builder = factory.newDocumentBuilder();
// ....
} catch (Exception e) {
e.printStackTrace(System.out);
}

DOM parser

20/01/2009

IOException
ParserConfigurationException
SAXException
6
Parse an XML file

Document document = builder.parse("../inventory.xml");

the entire XML file (as a tree)
(the Document Object Model)

20/01/2009

7
Root element
the root element
Element root = document.getDocumentElement();
System.out.println(root.getTagName());

20/01/2009

8
Nodes
Node

Text

Element

may have children

Attr

leaves
Operations on Nodes

Element

Text

Attr

getNodeName()

tag name

"#text"

name of attribute

getNodeValue()

null

text contents

value of attribute

getNodeType()

ELEMENT_NODE

TEXT_NODE

ATTRIBUTE_NODE

getAttributes()

NamedNodeMap

null

null

20/01/2009

9
Distinguishing Node types

switch(node.getNodeType()) {
case Node.ELEMENT_NODE:
Element element = (Element)node;
...;
break;
case Node.TEXT_NODE:
Text text = (Text)node;
...
break;
case Node.ATTRIBUTE_NODE:
Attr attr = (Attr)node;
...
break;
default: ...
}

20/01/2009

10
Operations on Nodes










getParentNode()
getFirstChild()
getNextSibling()
getPreviousSibling()
getLastChild()
hasAttributes()
hasChildNodes()

20/01/2009

11
Travel through children nodes

if (element.hasChildNodes()) {
Node child = element.getFirstChild();
while (child != null) {
// ....
child = child.getNextSibling();
}
}

20/01/2009

12
Operations for Elements







String getTagName()
boolean hasAttribute(String name)
String getAttribute(String name)
boolean hasAttributes()
NamedNodeMap
getAttributes()

20/01/2009

13
NamedNodeMap





Node getNamedItem(String name)
int getLength()
Node item(int index)
NamedNodeMap map = element.getAttributes();
for (int i = 0; i < map.getLength(); i++) {
Attr attr = (Attr) map.item(i);
System.out.println(attr.getNodeName()
+ "='"+ attr.getNodeValue()+"'");
}

20/01/2009

14
Operations on Texts




String getData()
int getLength()
String substringData(int offset, int count)

20/01/2009

15
Operations on Attrs




String getName()
Element getOwnerElement()
String getValue()

20/01/2009

16
2.

SAX

Simple API for XML
Import required packages

import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

20/01/2009

18
Create the parser
// Create a parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Tell factory that the parser must understand namespaces
factory.setNamespaceAware(true);
try {
// Make the parser
SAXParser saxParser = factory.newSAXParser();
XMLReader parser = saxParser.getXMLReader();
} catch(Exception e){
e.printStackTrace();
}

20/01/2009

IOException
ParserConfigurationException
SAXException
19
Parse an XML file
// Create a handler
Handler handler = new Handler();
// Tell the parser to use this handler
parser.setContentHandler(handler);
// Finally, read and parse the document
parser.parse("./inventory.xml");

20/01/2009

20
SAX handlers


A callback handler for SAX must implement four interfaces:
 interface ContentHandler
 interface DTDHandler
 interface EntityResolver
 interface ErrorHandler



It is easier to use an adapter class

20/01/2009

21
Class DefaultHandler





DefaultHandler is in package org.xml.sax.helpers
DefaultHandler implements ContentHandler, DTDHandler,
EntityResolver, and ErrorHandler
DefaultHandler is an adapter class




Provides empty methods for every method declared in each of the four
interfaces

To use this class, extend it and override the methods that are
important to your application

20/01/2009

22
The Handler class
class Handler extends DefaultHandler {
// SAX calls this method when it encounters a start tag
public void startElement(String namespaceURI,
String localName,
String qualifiedName,
Attributes attributes) throws SAXException {
System.out.println("startElement: " + qualifiedName);
}
// SAX calls this method to pass in character data
public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println("characters: "" +
new String(ch, start, length) + """);
}
// SAX call this method when it encounters an end tag
public void endElement(String namespaceURI,
String localName,
String qualifiedName) throws SAXException {
System.out.println("endElement: /" + qualifiedName);
}

}
20/01/2009

23
<inventory>
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<book year="1995">
<title>Zodiac</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553573862</isbn>
<price>7.50</price>
</book>
<!-- more books... -->
</inventory>
20/01/2009

startElement: inventory
characters: "
"
startElement: book
characters: "
"
startElement: title
characters: "Snow Crash"
endElement: /title
characters: "
"
startElement: author
characters: "Neal Stephenson"
endElement: /author
characters: "
"
startElement: publisher
characters: "Spectra"
endElement: /publisher
characters: "
...
"
endElement: /book
...
endElement: /inventory
24
Attributes











getLength()
getLocalName(index)
getQName(index)
getValue(index)
getType(index)

int getIndex(String qualifiedName)
int getIndex(String uri, String localName)
String getValue(String qualifiedName)
String getValue(String uri, String localName)

20/01/2009

25
Attributes

public void startElement(String namespaceURI,
String localName,
String qualifiedName,
Attributes attributes) throws SAXException {
// ....
for (int i = 0; i < attributes.getLength(); i++) {
String attName = attributes.getQName(i);
String attValue = attributes.getValue(i);
System.out.println(attName+"='"+attValue+"'");
}
// ....
}

20/01/2009

26
3.

JDOM

Java DOM
Import required packages
import
import
import
import

org.jdom.*;
org.jdom.input.*;
org.jdom.output.*;
org.jdom.adapters.*;

org.jdom
org.jdom.adapters
org.jdom.input
org.jdom.output
20/01/2009

28
Create the parser

try {
SAXBuilder builder = new SAXBuilder();

// ....
} catch (IOException ioe) {
ioe.printStackTrace();
} catch (JDOMException je) {
je.printStackTrace();
}

20/01/2009

29
Parse an XML file

Document document = builder.build("../inventory.xml");

20/01/2009

30
Root element

Element root = document.getRootElement();
System.out.println(root.getName());

20/01/2009

31
Print out the document

XMLOutputter outputter = new XMLOutputter();
outputter.output(document, System.out);

StringWriter sw = new StringWriter();
XMLOutputter outputter = new XMLOutputter();
outputter.output(document, sw);
String xml = sw.toString();

Advantage 1:
20/01/2009

Output facility
32
Get children
• Get all direct children
List allChildren = element.getChildren();

• Get all direct children with a given name
List namedChildren = element.getChildren("book");

• Get the first child with a given name
Element child = element.getChild("book");

Advantage 2:
20/01/2009

supports Java Collections
33
Travel through children nodes

List children = element.getChildren();
for (int i = 0; i < children.size(); i++) {
Element elem = (Element) children.get(i);
// ....
}

20/01/2009

34
Get attributes
• Get all attributes
List attrs = element.getAttributes();
for (int i = 0; i < attrs.size(); i++)
Attribute attr = (Attribute) attrs.get(i);
System.out.println(attr.getName()+" = "+attr.getValue());
}

• Get an attribute with a given name
Attribute attr = element.getAttribute("year");

• Get an attribute value with a given name
String value = element.getAttributeValue("year");

20/01/2009

35
Reading Element Content
• The text content is directly available
String content = element.getText();

• Remove extra whitespace
String content = element.getTextTrim();

20/01/2009

36
Mixed Content
• Sometimes an element may contain comments, text content, and children
<table>
<!-- Some comment -->
Some text
<tr>Some child</tr>
</table>

String text = table.getTextTrim();
Element tr = table.getChild("tr");

20/01/2009

37
Mixed Content

List mixedContent = table.getContent();
Iterator iter = mixedContent.iterator();
while (iter.hasNext()) {
Object obj = iter.next();
if (obj instanceof Comment) {
System.out.println("Comment: " + obj);
} else if (obj instanceof String) {
System.out.println("String: " + obj);
} else if (obj instanceof Element) {
System.out.println("Element: " + ((Element)obj).getName());
}
}

20/01/2009

38
References


Processing XML with Java; Elliotte Rusty Harold
http://cafeconleche.org/books/xmljava/chapters/index.html

20/01/2009

39

Java and XML

  • 1.
    Java and XML (DOM,SAX, JDOM) Raji GHAWI 20/01/2009
  • 2.
  • 3.
  • 4.
    <inventory> <book year="2000"> <title>Snow Crash</title> <author>NealStephenson</author> <publisher>Spectra</publisher> <isbn>0553380958</isbn> <price>14.95</price> </book> <book year="2005"> <title>Burning Tower</title> <author>Larry Niven</author> <author>Jerry Pournelle</author> <publisher>Pocket</publisher> <isbn>0743416910</isbn> <price>5.99</price> </book> <book year="1995"> <title>Zodiac</title> <author>Neal Stephenson</author> <publisher>Spectra</publisher> <isbn>0553573862</isbn> <price>7.50</price> </book> <!-- more books... --> </inventory> 20/01/2009 4
  • 5.
    Import required packages importjavax.xml.parsers.*; import org.w3c.dom.*; 20/01/2009 5
  • 6.
    Create the parser DOMparser factory try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); // .... } catch (Exception e) { e.printStackTrace(System.out); } DOM parser 20/01/2009 IOException ParserConfigurationException SAXException 6
  • 7.
    Parse an XMLfile Document document = builder.parse("../inventory.xml"); the entire XML file (as a tree) (the Document Object Model) 20/01/2009 7
  • 8.
    Root element the rootelement Element root = document.getDocumentElement(); System.out.println(root.getTagName()); 20/01/2009 8
  • 9.
    Nodes Node Text Element may have children Attr leaves Operationson Nodes Element Text Attr getNodeName() tag name "#text" name of attribute getNodeValue() null text contents value of attribute getNodeType() ELEMENT_NODE TEXT_NODE ATTRIBUTE_NODE getAttributes() NamedNodeMap null null 20/01/2009 9
  • 10.
    Distinguishing Node types switch(node.getNodeType()){ case Node.ELEMENT_NODE: Element element = (Element)node; ...; break; case Node.TEXT_NODE: Text text = (Text)node; ... break; case Node.ATTRIBUTE_NODE: Attr attr = (Attr)node; ... break; default: ... } 20/01/2009 10
  • 11.
  • 12.
    Travel through childrennodes if (element.hasChildNodes()) { Node child = element.getFirstChild(); while (child != null) { // .... child = child.getNextSibling(); } } 20/01/2009 12
  • 13.
    Operations for Elements      StringgetTagName() boolean hasAttribute(String name) String getAttribute(String name) boolean hasAttributes() NamedNodeMap getAttributes() 20/01/2009 13
  • 14.
    NamedNodeMap    Node getNamedItem(String name) intgetLength() Node item(int index) NamedNodeMap map = element.getAttributes(); for (int i = 0; i < map.getLength(); i++) { Attr attr = (Attr) map.item(i); System.out.println(attr.getNodeName() + "='"+ attr.getNodeValue()+"'"); } 20/01/2009 14
  • 15.
    Operations on Texts    StringgetData() int getLength() String substringData(int offset, int count) 20/01/2009 15
  • 16.
    Operations on Attrs    StringgetName() Element getOwnerElement() String getValue() 20/01/2009 16
  • 17.
  • 18.
    Import required packages importjavax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; 20/01/2009 18
  • 19.
    Create the parser //Create a parser factory SAXParserFactory factory = SAXParserFactory.newInstance(); // Tell factory that the parser must understand namespaces factory.setNamespaceAware(true); try { // Make the parser SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader(); } catch(Exception e){ e.printStackTrace(); } 20/01/2009 IOException ParserConfigurationException SAXException 19
  • 20.
    Parse an XMLfile // Create a handler Handler handler = new Handler(); // Tell the parser to use this handler parser.setContentHandler(handler); // Finally, read and parse the document parser.parse("./inventory.xml"); 20/01/2009 20
  • 21.
    SAX handlers  A callbackhandler for SAX must implement four interfaces:  interface ContentHandler  interface DTDHandler  interface EntityResolver  interface ErrorHandler  It is easier to use an adapter class 20/01/2009 21
  • 22.
    Class DefaultHandler    DefaultHandler isin package org.xml.sax.helpers DefaultHandler implements ContentHandler, DTDHandler, EntityResolver, and ErrorHandler DefaultHandler is an adapter class   Provides empty methods for every method declared in each of the four interfaces To use this class, extend it and override the methods that are important to your application 20/01/2009 22
  • 23.
    The Handler class classHandler extends DefaultHandler { // SAX calls this method when it encounters a start tag public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes attributes) throws SAXException { System.out.println("startElement: " + qualifiedName); } // SAX calls this method to pass in character data public void characters(char ch[], int start, int length) throws SAXException { System.out.println("characters: "" + new String(ch, start, length) + """); } // SAX call this method when it encounters an end tag public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException { System.out.println("endElement: /" + qualifiedName); } } 20/01/2009 23
  • 24.
    <inventory> <book year="2000"> <title>Snow Crash</title> <author>NealStephenson</author> <publisher>Spectra</publisher> <isbn>0553380958</isbn> <price>14.95</price> </book> <book year="2005"> <title>Burning Tower</title> <author>Larry Niven</author> <author>Jerry Pournelle</author> <publisher>Pocket</publisher> <isbn>0743416910</isbn> <price>5.99</price> </book> <book year="1995"> <title>Zodiac</title> <author>Neal Stephenson</author> <publisher>Spectra</publisher> <isbn>0553573862</isbn> <price>7.50</price> </book> <!-- more books... --> </inventory> 20/01/2009 startElement: inventory characters: " " startElement: book characters: " " startElement: title characters: "Snow Crash" endElement: /title characters: " " startElement: author characters: "Neal Stephenson" endElement: /author characters: " " startElement: publisher characters: "Spectra" endElement: /publisher characters: " ... " endElement: /book ... endElement: /inventory 24
  • 25.
    Attributes          getLength() getLocalName(index) getQName(index) getValue(index) getType(index) int getIndex(String qualifiedName) intgetIndex(String uri, String localName) String getValue(String qualifiedName) String getValue(String uri, String localName) 20/01/2009 25
  • 26.
    Attributes public void startElement(StringnamespaceURI, String localName, String qualifiedName, Attributes attributes) throws SAXException { // .... for (int i = 0; i < attributes.getLength(); i++) { String attName = attributes.getQName(i); String attValue = attributes.getValue(i); System.out.println(attName+"='"+attValue+"'"); } // .... } 20/01/2009 26
  • 27.
  • 28.
  • 29.
    Create the parser try{ SAXBuilder builder = new SAXBuilder(); // .... } catch (IOException ioe) { ioe.printStackTrace(); } catch (JDOMException je) { je.printStackTrace(); } 20/01/2009 29
  • 30.
    Parse an XMLfile Document document = builder.build("../inventory.xml"); 20/01/2009 30
  • 31.
    Root element Element root= document.getRootElement(); System.out.println(root.getName()); 20/01/2009 31
  • 32.
    Print out thedocument XMLOutputter outputter = new XMLOutputter(); outputter.output(document, System.out); StringWriter sw = new StringWriter(); XMLOutputter outputter = new XMLOutputter(); outputter.output(document, sw); String xml = sw.toString(); Advantage 1: 20/01/2009 Output facility 32
  • 33.
    Get children • Getall direct children List allChildren = element.getChildren(); • Get all direct children with a given name List namedChildren = element.getChildren("book"); • Get the first child with a given name Element child = element.getChild("book"); Advantage 2: 20/01/2009 supports Java Collections 33
  • 34.
    Travel through childrennodes List children = element.getChildren(); for (int i = 0; i < children.size(); i++) { Element elem = (Element) children.get(i); // .... } 20/01/2009 34
  • 35.
    Get attributes • Getall attributes List attrs = element.getAttributes(); for (int i = 0; i < attrs.size(); i++) Attribute attr = (Attribute) attrs.get(i); System.out.println(attr.getName()+" = "+attr.getValue()); } • Get an attribute with a given name Attribute attr = element.getAttribute("year"); • Get an attribute value with a given name String value = element.getAttributeValue("year"); 20/01/2009 35
  • 36.
    Reading Element Content •The text content is directly available String content = element.getText(); • Remove extra whitespace String content = element.getTextTrim(); 20/01/2009 36
  • 37.
    Mixed Content • Sometimesan element may contain comments, text content, and children <table> <!-- Some comment --> Some text <tr>Some child</tr> </table> String text = table.getTextTrim(); Element tr = table.getChild("tr"); 20/01/2009 37
  • 38.
    Mixed Content List mixedContent= table.getContent(); Iterator iter = mixedContent.iterator(); while (iter.hasNext()) { Object obj = iter.next(); if (obj instanceof Comment) { System.out.println("Comment: " + obj); } else if (obj instanceof String) { System.out.println("String: " + obj); } else if (obj instanceof Element) { System.out.println("Element: " + ((Element)obj).getName()); } } 20/01/2009 38
  • 39.
    References  Processing XML withJava; Elliotte Rusty Harold http://cafeconleche.org/books/xmljava/chapters/index.html 20/01/2009 39