SAX, DOM & JDOM parsers for beginners

1,389 views

Published on

An eady reference of SAX, DOM and JDOM parsers

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,389
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
58
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

SAX, DOM & JDOM parsers for beginners

  1. 1. Parsing XML with SAX, DOM & JDOM Hicham Qaissi hicham.qaissi@gmail.com 1
  2. 2. Contents 0. What is an XML parser? ............................................................................................ 3 1. Describing the example to develop........................................................................... 3 2. SAX............................................................................................................................. 6 3. DOM ........................................................................................................................ 11 4. JDOM....................................................................................................................... 14 5. Conclusion ............................................................................................................... 16 2
  3. 3. 0. What is an XML parser? The XML parsers bring us the possibility of analyzing and composing of the XMLdocuments. Analyzing the XML data and structure, we can make some objects in somelanguages programming (Java in our case). Also we can make the inverse process, in otherwords, make a XML document from some data objects (See Fig. 1). In this manual, I analyzewith examples three kinds, SAX, DOM & JDOM. 1. Describing the example to develop The example that I make is entertained. This is the same for the entire three API (SAX,DOM and JDOM). The example consists in analyzing a XML document that containsinformation about some books (ISBN code (isbn is an attribute), Name, Author name, Price,Editorial). The program expects a book code (ISBN), and searches this book into the XML. If thebook exists, all its information are printed by the standard output, in other case, we print amessage notifying that the book doesn’t exist in the XML. Are you finding it as amusing as I do?Let’s go!!! 3
  4. 4. The xml example (books.xml) is the following:<books> <book isbn="0000000001"> <name>Book 1</name> <author>Author name 1</author> <price>12.54</price> <editorial>Editorial 1</editorial> </book> <book isbn="0000000002"> <name>Book 2</name> <author>Author name 2</author> <price>58.25</price> <editorial>Editorial 2</editorial> </book> <book isbn="0000000003"> <name>Book 3</name> <author>Author name 3</author> <price>29.45</price> <editorial>Editorial 3</editorial> </book> <book isbn="0000000004"> <name>Book 4</name> <author>Author name 4</author> <price>78.95</price> <editorial>Editorial 4</editorial> </book> <book isbn="0000000005"> <name>PBook 5</name> <author>Author name 5</author> <price>61.25</price> <editorial>Editorial 5</editorial> </book></books> 4
  5. 5. For all parsers (SAX, DOM & JDOM), I use this DTO (Data Transfer Object):public class MyBook { private String isbn; private String name; private String author; private String price; private String editorial; public String getIsbn() { return isbn; } public void setIsbn(String isbn) { this.isbn = isbn; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } public String getPrice() { return price; } public void setPrice(String price) { this.price = price; } public String getEditorial() { return editorial; } public void setEditorial(String editorial) { this.editorial = editorial; }} 5
  6. 6. 2. SAX SAX (Simple API for XML), it Works by events and associated methods. As the parser isreading the document XML and finds the components (the events) of the document(elements, attributes, values, etc) or it detects errors, is invoking to the methods that theprogrammer has associated. You can find more information about SAX onwww.saxproject.org. First, be sure that you’ve included the sax jar in the classpath (The jar file can bedownloaded http://sourceforge.net/projects/sax/files/). We must instantiate the reader. Thisreader implements the XMLReader’s interface, we can obtain it from the abstract classSAXParser. I obtain SAXParser from the SAXParserFactory. The method parse of XMLReaderanalyses the xml document:import java.io.IOException;import org.xml.sax.SAXException;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.SAXParser;import javax.xml.parsers.SAXParserFactory;import org.xml.sax.XMLReader;public class MySAXSeracher{ public static void main(String[] args) { try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware( true ); factory.setValidating( true ); SAXParser saxParser = factory.newSAXParser(); XMLReader xr = saxParser.getXMLReader(); xr.parse( args[0] ); } catch ( IOException ioe ) { System.out.println( "Error: " + ioe.getMessage() ); } catch ( SAXException saxe ){ System.out.println( "Error: " + saxe.getMessage() ); } catch ( ParserConfigurationException pce ){ System.out.println( "Error: " + pce.getMessage() ); } }} If the program compiles, it means that java and the jar file are ok. Nevertheless, theprogram doesn’t do anything because we haven’t been interested on any event at themoment. It’s important to catch the exceptions java.io.IOException,org.xml.sax.SAXException andjavax.xml.parsers.ParserConfigurationException. 6
  7. 7. To manipulate the events, our main class must extendsorg.xml.sax.helpers.DefaultHandler. DefaultHandler implements the followinginterfaces:org.xml.sax.ContentHandler: events about data (The most extended)org.xml.sax.ErrorHandler: events about errorsorg.xml.sax.DTDhandler: DTD’s treatmentorg.xml.sax.EntityResolver: foreign entities We can make our own classes implementing ContentHandler and ErrorHandler to treatthe event which we are interested in: Data: implementing ContentHandler and associate it to the reader (parser) with themethod setContenthandler(). Errors: implementing ErrorHandler and associate it to the reader (parser) with themethod setErrorHandler(). The most important methods in the interface ContentHandler (implemented byDefaultHandler which is extended by our class MySAXSearcher) are: • startDocument():Receive notification of the beginning of a document. • endDocument(): Receive notification of the end of a document. • startElement():Receive notification of the beginning of an element • endElement():Receive notification of the end of an element. • characters():Receive notification of character data. See more about ContentHandler onhttp://download.oracle.com/javase/1.4.2/docs/api/org/xml/sax/ContentHandler.html. Now, MySAXSearcher is the following (I’ve made my own ContentHandler andErrorHandler, it’s much more clean than overriding the ContentHandler and ErrorHandlerinteresting methods in our class that extends DefaultHandler): 7
  8. 8. MySAXSearcher.java:import java.io.IOException;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.SAXParser;import javax.xml.parsers.SAXParserFactory;import org.xml.sax.SAXException;import org.xml.sax.XMLReader;import org.xml.sax.helpers.DefaultHandler;public class MySAXSearcher extends DefaultHandler{ public static void main(String[] args) { MySAXSearcher searcher = new MySAXSearcher(); searcher.searchBook(args[0], args[1]); } private void searchBook(String xml, String isbn){ try { SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware( true ); factory.setValidating( true ); SAXParser saxParser = factory.newSAXParser(); XMLReader xr = saxParser.getXMLReader(); // Assigning my own ContentHandler at my XMLReader. MyContentHandler ch = new MyContentHandler(); ch.isbnSearched = isbn; xr.setContentHandler( ch ); // Assigning my own ErrorHandler at my XMLReader. xr.setErrorHandler( new MyOwnErrorHandler() ); xr.setFeature( "http://xml.org/sax/features/validation", false); xr.setFeature( "http://xml.org/sax/features/namespaces", true); long before = System.currentTimeMillis(); xr.parse( xml ); long after = System.currentTimeMillis(); printResult (xml, ch, after - before); } catch ( IOException ioe ) { System.out.println( "Error: " + ioe.getMessage() ); } catch ( SAXException saxe ){ System.out.println( "Error: " + saxe.getMessage() ); } catch ( ParserConfigurationException pce ){ System.out.println( "Error: " + pce.getMessage() ); } } public void printResult(String xml, MyContentHandler ch, long time){ System.out.println("Document " + xml + ". Parsed in : " + time + "ms"); if (ch.book != null){ System.out.println("Book found:"); System.out.println(" Isbn: " + ch.book.getIsbn()); System.out.println(" Name: " + ch.book.getName()); System.out.println(" Author: " + ch.book.getAuthor()); System.out.println(" Price: " + ch.book.getPrice()); System.out.println(" Editorial: " + ch.book.getEditorial()); 8
  9. 9. } else { System.out.println("Book not found"); } }}MyContentHandler.java:import org.xml.sax.Attributes;import org.xml.sax.ContentHandler;import org.xml.sax.Locator;import org.xml.sax.SAXException;public class MyContentHandler implements ContentHandler { boolean isBookFound = false; String isbnSearched = ""; String currentNode = ""; MyBook book = null; // Overrided public void startDocument() throws SAXException { System.out.println("***Start document***"); } // Overrided public void endDocument() throws SAXException { System.out.println("***End document***"); } // Overrided public void startElement(String uri, String local, String raw,Attributes attrs) { currentNode = local; if ("book".equals(local) && !isBookFound){ // The book node only has an attribute (isbn) if ("isbn".equals(attrs.getLocalName(0)) &&isbnSearched.equals(attrs.getValue(0))){ isBookFound = true; book = new MyBook(); book.setIsbn(isbnSearched); } } } // Overrided public void characters(char ch[], int start, int length) { String value = ""; // I get the text value for (int i = start; i < start + length; i++) { value+= Character.toString(ch [i]); } if (!"".equals(value.trim()) && isBookFound){ if("name".equals(currentNode)){ book.setName(value.trim()); } else if ("author".equals(currentNode)){ book.setAuthor(value.trim()); } else if ("price".equals(currentNode)){ book.setPrice(value.trim()); } else if ("editorial".equals(currentNode)){ book.setEditorial(value.trim()); isBookFound = false; } } 9
  10. 10. } // Overrided public void endElement(String arg0, String arg1, String arg2) throws SAXException { } // Overrided public void endPrefixMapping(String arg0) throws SAXException { } // Overrided public void ignorableWhitespace(char[] arg0, int arg1, int arg2) throws SAXException { } // Overrided public void processingInstruction(String arg0, String arg1) throws SAXException { } // Overrided public void setDocumentLocator(Locator arg0) { } // Overrided public void skippedEntity(String arg0) throws SAXException { } // Overrided public void startPrefixMapping(String arg0, String arg1) throws SAXException { }}MyErrorHandler.java:import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;public class MyErrorHandler implements ErrorHandler { // Overrided public void warning(SAXParseException ex) { System.err.println("[Warning] : "+ ex.getMessage()); } // Overrided public void error(SAXParseException ex) { System.err.println("[Error] : "+ex.getMessage()); } // Overrided public void fatalError(SAXParseException ex) throws SAXException { System.err.println("[Error!] : "+ex.getMessage()); }} With our xml (books.xml), and the book code to search 0000000003, we can executedour program with: java MySAXSearcher “books.xml” “0000000003” 10
  11. 11. The result must be the following: ***Start document*** ***End document*** Document books.xml Parsed in: 141ms Book found: Isbn: 0000000003 Name: Book 3 Author: Author name 3 Price: 29.45 Editorial: Editorial 3 3. DOM DOM (Document Object Model), while SAX offers access at all elements of document,DOM brings the parsing as a tree that can be parsed and transformed. DOM has somedisadvantages and advantages with regards to SAX: Disadvantage: • The data can be acceded only when the entire document is parsed. • The tree is an object loaded on the memory; this is problematic for big and complex documents. Advantages: • With DOM we can manipulate (update, delete and add elements) the xml document. Also, we can create a new xml document. To manipulate an xml document, we must instantiate a Document (interface) objectthat implements the Document interface (extends the interface Node). We use the classes’javax.xml.parsers.DocumentBuilder and javax.xml.parsers.DocumentBuilderFactory, weinvoke the method parse() to obtain a Document object. For manipulate an XML with DOM, there are some important classes’:org.w3c.dom.Document (interface representing the entire XML document),org.w3c.dom.Element (Elements in the XML document), org.w3c.dom.Node (node that hassome elements) and org.w3c.dom.Att (The attributes of every element). Ok, now let’s talk in java code language. As DTO (Data Transfer Object), I use the sameobject MyBook. 11
  12. 12. MyDOMSearcher.java:import java.io.File;import java.io.IOException;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.ParserConfigurationException;import org.w3c.dom.Document;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import org.xml.sax.SAXException;public class MyDOMSearcher { public static void main(String[] args) { MyDOMSearcher searcher = new MyDOMSearcher(); searcher.searchBook(args[0], args[1]); } private void searchBook(String xml, String isbn) { long before = System.currentTimeMillis(); MyBook book = null; try{ DocumentBuilderFactory factory =DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(true); DocumentBuilder parser = factory.newDocumentBuilder(); // I assign my own ErrorHandler to my Parser parser.setErrorHandler(new MyErrorHandler()); File file = new File(xml); Document doc = parser.parse (file); // I obtain all the elements <book> // NodeList is an interface that has 2 methods: // 1. item(int): returns the Node (Interface) Object of theposition int. // 2. getLength(): returns the length of the List NodeList booksNodes = doc.getElementsByTagName("book"); NodeList bookChildsNodes = null; String isbnAttribute = ""; for(int i = 0; i < booksNodes.getLength(); i++) { Node node = booksNodes.item(i); if(node != null && node.hasAttributes()) { isbnAttribute =node.getAttributes().getNamedItem("isbn").getNodeValue(); if(isbnAttribute.equals(isbn)){ //Ive caught the isbn searched if(book == null){ book = new MyBook(); book.setIsbn(isbn); } if(node.hasChildNodes()){ bookChildsNodes = node.getChildNodes(); for (int j = 0; j < bookChildsNodes.getLength(); j++) { if("name".equals(bookChildsNodes.item(j).getNodeName())){book.setName(bookChildsNodes.item(j).getTextContent()); 12
  13. 13. }elseif("author".equals(bookChildsNodes.item(j).getNodeName())){ book.setAuthor(bookChildsNodes.item(j).getTextContent());}else if("price".equals(bookChildsNodes.item(j).getNodeName())){ book.setPrice(bookChildsNodes.item(j).getTextContent());}else if("editorial".equals(bookChildsNodes.item(j).getNodeName())){ book.setEditorial(bookChildsNodes.item(j).getTextContent()); // Ive found my book. Ending the for iteration break;} } } } }} }catch(IOException ioe){ System.err.println("[Error] : "+ioe.getMessage()); }catch(ParserConfigurationException pce){ System.err.println("[Error] : "+pce.getMessage()); }catch(SAXException se){ System.err.println("[Error] : "+se.getMessage()); } long after = System.currentTimeMillis(); printResults(xml, book, after - before);}public void printResults(String xml, MyBook book, long time) { System.out.println("Document " + xml + ". Parsed in : " + time + "ms"); if (book != null){ System.out.println("Book found:"); System.out.println(" Isbn: " + book.getIsbn()); System.out.println(" Name: " + book.getName()); System.out.println(" Author: " + book.getAuthor()); System.out.println(" Price: " + book.getPrice()); System.out.println(" Editorial: " + book.getEditorial()); }else{ System.out.println("Book not found"); } }} 13
  14. 14. 4. JDOM All the precedents API’s are available for many programming languages, but their useis laborious in Java. A specific API has been made for java (JDOM), that API uses the owncapacities and features of Java, therefore, using it make the XMlL parsing easily. We can findsome related information on www.jdom.org. Now, let’s make the same example (searching a book in our XML) with JDOM (be surethat the jar is installed in your classpath, you can download it onhttp://www.jdom.org/dist/binary/).MyJDOMSearcher.java:import java.io.IOException;import java.util.Iterator;import java.util.List;import org.jdom.Document;import org.jdom.Element;import org.jdom.JDOMException;import org.jdom.input.SAXBuilder;public class MyJDOMSearcher { private String isbn; private MyBook book; private boolean noSearchMore = false; public static void main(String[] args) { try { long before = System.currentTimeMillis(); MyJDOMSearcher searcher = new MyJDOMSearcher(); // The second parameter is the isbn to search searcher.isbn = args[1]; SAXBuilder saxBuilder = new SAXBuilder(); Document document = saxBuilder.build(args[0]); searcher.searchBook(document.getRootElement()); long after = System.currentTimeMillis(); searcher.printResults(args[0], after-before); } catch (JDOMException jde){ System.err.println("[Error] JDOMException: "+jde.getMessage()); } catch (IOException ioe){ System.err.println("[Error] IOException: "+ioe.getMessage()); } } private void searchBook(Element element){ inspect(element); List content = element.getContent(); Iterator iterator = content.iterator(); Element child = null; Object object = null; 14
  15. 15. while(iterator.hasNext()){ // All times we have "books" node object = iterator.next(); if(object instanceof Element){ child = ((Element)object); //Casting from Object to Element searchBook(child); } } } // Recursively descend the tree public void inspect(Element element) { if (!noSearchMore){ // If Ive had the book yet, Ill do anything if("book".equals(element.getQualifiedName()) & book == null){ if(isbn.equals(element.getAttribute("isbn").getValue())){ book = new MyBook(); book.setIsbn(isbn); } } if(book != null){ if("name".equals(element.getQualifiedName())){ book.setName(element.getValue()); } if("author".equals(element.getQualifiedName())){ book.setAuthor(element.getValue()); } if("price".equals(element.getQualifiedName())){ book.setPrice(element.getValue()); } if("editorial".equals(element.getQualifiedName())){ book.setEditorial(element.getValue()); noSearchMore = true; } } } } private void printResults(String xml, long time) { System.out.println("Document " + xml + ". Parsed in : " + time + "ms"); if (book != null){ System.out.println("Book found:"); System.out.println(" Isbn: " + book.getIsbn()); System.out.println(" Name: " + book.getName()); System.out.println(" Author: " + book.getAuthor()); System.out.println(" Price: " + book.getPrice()); System.out.println(" Editorial: " + book.getEditorial()); } else { System.out.println("Book not found"); } }} 15
  16. 16. 5. Conclusion Executing the same example with the three API’s (MySAXSearcher, MyDOMSearcherand MyJDOMSearcher) having us parameters received the same xml file and the isbn to search("0000000003"), the result (in time) obtained is the following: MySAXSearcher MyDOMSearcher MyJDOMSearcher 93 ms 750 ms 609 ms The SAX API is faster than DOM and JDOM (But it’s laborious). 16

×