XML Prepared By Srinivasan Jayakumar
Briefly: The Power of XML <ul><li>XML is Extensible Markup Language </li></ul><ul><ul><li>Text-based representation for de...
Synergy between Java and XML <ul><li>Java+XML=Portable language+Portable Data </li></ul><ul><li>Allows use Java to generat...
HTML and XML <ul><li>HTML and XML look similar, because they are both SGML languages  </li></ul><ul><ul><li>use elements e...
HTML and XML <ul><li>HTML is for humans </li></ul><ul><ul><li>HTML describes web pages </li></ul></ul><ul><ul><li>Browsers...
Example XML document <?xml version=&quot;1.0&quot;?> <weatherReport> <date>7/14/97</date> <city>North Place</city>, <state...
Overall structure  <ul><li>An XML document may start with one or more processing instructions or directives: </li></ul><ul...
XML building blocks <ul><li>Aside from the directives, an XML document is built from: </li></ul><ul><ul><li>elements:  hig...
Elements and attributes <ul><li>Attributes and elements are interchangeable </li></ul><ul><li>Example: </li></ul><ul><li>E...
Well-formed XML <ul><li>In XML, every element must have both a start tag and an end tag, e.g.  <name> ... </name> </li></u...
XML as a tree <ul><li>An XML document represents a hierarchy </li></ul><ul><li>A hierarchy is a tree </li></ul>novel forew...
Viewing XML <ul><li>XML is designed to be processed by computer programs, not to be displayed to humans </li></ul><ul><li>...
XML Parsers
Stream Model <ul><li>Stream seen by parser is a sequence of elements </li></ul><ul><li>As each XML element is seen, an eve...
Data Model <ul><li>The XML data is transformed into a navigable data structure in memory </li></ul><ul><ul><li>Because of ...
SAX and DOM <ul><li>SAX and DOM are standards for XML  parsers </li></ul><ul><ul><li>DOM is a W3C standard </li></ul></ul>...
Difference between SAX and DOM <ul><li>DOM reads the entire document into memory and stores it as a tree data structure </...
SAX Parsing
Parsing with SAX <ul><li>SAX uses the source-listener-delegate model for parsing XML documents </li></ul><ul><ul><li>Sourc...
SAX Parsing: process XML as Stream
Simple SAX program <ul><li>The program consists of two classes: </li></ul><ul><ul><li>Sample  -- This class contains the  ...
The  Sample  class <ul><li>import javax.xml.parsers.*; // for both SAX and DOM import org.xml.sax.*; import org.xml.sax.he...
The  Sample  class <ul><li>  // Create a handler   Handler handler = new Handler(); </li></ul><ul><ul><li>// Tell the pars...
The  Handler  class <ul><li>public class Handler extends DefaultHandler { </li></ul><ul><ul><li>DefaultHandler  is an adap...
The  Handler  class <ul><li>// SAX calls this method to pass in character data   public void characters(char ch[ ], int st...
Results <ul><li>If the file  hello.xml  contains:   <?xml version=&quot;1.0&quot;?>   <display>Hello World!</display> </li...
More results <ul><li>Now suppose the file  hello.xml   contains : </li></ul><ul><ul><li><?xml version=&quot;1.0&quot;?> <d...
Factories <ul><li>SAX uses a parser factory </li></ul><ul><ul><li>A factory is a design pattern alternative to constructor...
Parser factories <ul><li>To create a SAX parser factory, call static method: SAXParserFactory.newInstance() </li></ul><ul>...
Getting a parser <ul><li>Once a  SAXParserFactory   factory  was set up, parsers can be created with:   SAXParser saxParse...
Declaring which handler to use <ul><li>Since the SAX parser will call the handlers, we need to supply these methods </li><...
SAX handlers <ul><li>A callback handler must implement 4 interfaces: </li></ul><ul><ul><li>interface ContentHandler </li><...
Class  DefaultHandler <ul><li>DefaultHandler  is in an adapter from package  org.xml.sax.helpers  </li></ul><ul><li>Defaul...
ContentHandler  methods <ul><li>public void startElement(String namespaceURI,   String localName, String qualifiedName,   ...
ContentHandler  methods <ul><li>endElement(String namespaceURI,   String localName, String qualifiedName)   throws SAXExce...
Error Handling <ul><li>SAX error handling is unusual </li></ul><ul><li>Most errors are ignored unless you an error handler...
External parsers <ul><li>Alternatively, you can use an existing parser: </li></ul><ul><ul><li>Xerces, Electric XML, Expat,...
Problems with SAX <ul><li>SAX provides only sequential access to the document being processed </li></ul><ul><li>SAX has on...
DOM Parsing
DOM <ul><li>DOM represents the XML document as a tree </li></ul><ul><ul><li>Hierarchical nature of tree maps well to hiera...
DOM Parsing: process entire document
Simple DOM program <ul><li>First we need to create a DOM parser, called a DocumentBuilder </li></ul><ul><li>The parser is ...
Simple DOM program <ul><li>An XML file  hello.xml  will be be parsed   <?xml version=&quot;1.0&quot;?>   <display>Hello Wo...
Reading in the tree <ul><li>The  parse  method reads in the entire XML document and represents it as a tree in memory </li...
Structure of the DOM tree <ul><li>The DOM tree is composed of  Node  objects </li></ul><ul><li>Node  is an interface </li>...
Operations on  Node s <ul><li>The results returned by  getNodeName() ,  getNodeValue() ,  getNodeType()  and  getAttribute...
Distinguishing  Node  types <ul><li>An easy way to handle different types of nodes: </li></ul><ul><ul><li>switch(node.getN...
Operations on  Node s <ul><li>Tree-walking methods that return a  Node : </li></ul><ul><ul><li>getParentNode() </li></ul><...
Operations for  Element s <ul><li>String getTagName() </li></ul><ul><ul><li>Returns the name of the tag   </li></ul></ul><...
Operations on  Text s <ul><li>Text  is a subinterface of  CharacterData  and inherits the following operations (among othe...
Operations on  Attribute s <ul><li>String getName()   </li></ul><ul><ul><li>Returns the name of this attribute.  </li></ul...
Overview <ul><li>DOM, unlike SAX, gives allows to create and modify XML trees </li></ul><ul><li>There are three basic kind...
Creating a new DOM import javax.xml.parsers.*; import org.w3c.dom.Document; … try { DocumentBuilderFactory factory = Docum...
Creating structure <ul><li>The following are instance methods of  Document : </li></ul><ul><ul><li>public Element createEl...
Methods of  Node <ul><li>public Node appendChild(Node newChild) </li></ul><ul><li>public Node insertBefore(Node newChild, ...
Methods of  Element <ul><li>public void setAttribute(String name, String value) </li></ul><ul><li>public Attr setAttribute...
Method of  Attribute <ul><li>public void setValue(String value) </li></ul><ul><li>This is the only method that modifies an...
Queries  ?
Upcoming SlideShare
Loading in...5
×

Java XML Parsing

2,192

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,192
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
95
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Java XML Parsing

  1. 1. XML Prepared By Srinivasan Jayakumar
  2. 2. Briefly: The Power of XML <ul><li>XML is Extensible Markup Language </li></ul><ul><ul><li>Text-based representation for describing data structure </li></ul></ul><ul><ul><ul><li>Both human and machine readable </li></ul></ul></ul><ul><ul><li>Originated from Standardized Generalized Markup Language (SGML) </li></ul></ul><ul><ul><li>Became a World Wide Web Consortium (W3C) standard in 1998 </li></ul></ul><ul><li>XML is a great choice for exchanging data between disparate systems </li></ul>
  3. 3. Synergy between Java and XML <ul><li>Java+XML=Portable language+Portable Data </li></ul><ul><li>Allows use Java to generate XML data </li></ul><ul><ul><li>Use Java to access SQL databases </li></ul></ul><ul><ul><li>Use Java to format data in XML </li></ul></ul><ul><ul><li>Use Java to parse data </li></ul></ul><ul><ul><li>Use Java to validate data </li></ul></ul><ul><ul><li>Use Java to transform data </li></ul></ul>
  4. 4. HTML and XML <ul><li>HTML and XML look similar, because they are both SGML languages </li></ul><ul><ul><li>use elements enclosed in tags (e.g. <body>This is an element</body> ) </li></ul></ul><ul><ul><li>use tag attributes (e.g., <font face=&quot;Verdana&quot; size=&quot;+1&quot; color=&quot;red&quot;> ) </li></ul></ul><ul><li>More precisely, </li></ul><ul><ul><li>HTML is defined in SGML </li></ul></ul><ul><ul><li>XML is a (very small) subset of SGML </li></ul></ul>
  5. 5. HTML and XML <ul><li>HTML is for humans </li></ul><ul><ul><li>HTML describes web pages </li></ul></ul><ul><ul><li>Browsers ignore and/or correct many HTML errors, so HTML is often sloppy </li></ul></ul><ul><li>XML is for computers </li></ul><ul><ul><li>XML describes data </li></ul></ul><ul><ul><li>The rules are strict and errors are not allowed </li></ul></ul><ul><ul><ul><li>In this way, XML is like a programming language </li></ul></ul></ul><ul><ul><li>Current versions of most browsers display XML </li></ul></ul>
  6. 6. Example XML document <?xml version=&quot;1.0&quot;?> <weatherReport> <date>7/14/97</date> <city>North Place</city>, <state>NX</state> <country>USA</country> High Temp: <high scale=&quot;F&quot;>103</high> Low Temp: <low scale=&quot;F&quot;>70</low> Morning: <morning>Partly cloudy, Hazy</morning> Afternoon: <afternoon>Sunny & amp; hot</afternoon> Evening: <evening>Clear and Cooler</evening> </weatherReport>
  7. 7. Overall structure <ul><li>An XML document may start with one or more processing instructions or directives: </li></ul><ul><ul><li><?xml version=&quot;1.0&quot;?> <?xml-stylesheet type=&quot;text/css&quot; href=&quot;ss.css&quot;?> </li></ul></ul><ul><li>Following the directives, there must be exactly one root element containing all the rest of the XML: </li></ul><ul><ul><li><weatherReport> ... </weatherReport> </li></ul></ul>
  8. 8. XML building blocks <ul><li>Aside from the directives, an XML document is built from: </li></ul><ul><ul><li>elements: high in < high scale=&quot;F&quot;>103</ high > </li></ul></ul><ul><ul><li>tags, in pairs: <high scale=&quot;F&quot;> 103 </high> </li></ul></ul><ul><ul><li>attributes: <high scale=&quot;F&quot; >103</high> </li></ul></ul><ul><ul><li>entities: <afternoon>Sunny & amp; hot</afternoon> </li></ul></ul><ul><ul><li>data: <high scale=&quot;F&quot;> 103 </high> </li></ul></ul>
  9. 9. Elements and attributes <ul><li>Attributes and elements are interchangeable </li></ul><ul><li>Example: </li></ul><ul><li>Elements are easier to use from Java </li></ul><ul><li>Attributes may contain elaborate metadata, such as unique IDs </li></ul><ul><ul><li><name> <first>David</first> <last>Smith</last> </li></ul></ul><ul><ul><li></name> </li></ul></ul><name first=&quot;David&quot; last= &quot; Smith&quot;> </name>
  10. 10. Well-formed XML <ul><li>In XML, every element must have both a start tag and an end tag, e.g. <name> ... </name> </li></ul><ul><ul><li>Empty elements can be abbreviated: <break /> . </li></ul></ul><ul><ul><li>XML tags are case sensitive and may not begin with the letters xml , in any combination of cases </li></ul></ul><ul><li>Elements must be properly nested </li></ul><ul><ul><li>e.g. not <b><i>bold and italic</b></i> </li></ul></ul><ul><li>XML document must have one and only one root element </li></ul><ul><li>The values of attributes must be enclosed in quotes </li></ul><ul><ul><li>e.g. <time unit=&quot;days&quot;> </li></ul></ul>
  11. 11. XML as a tree <ul><li>An XML document represents a hierarchy </li></ul><ul><li>A hierarchy is a tree </li></ul>novel foreword chapter number=&quot;1&quot; paragraph paragraph paragraph This is the great American novel. It was a dark and stormy night. Suddenly, a shot rang out!
  12. 12. Viewing XML <ul><li>XML is designed to be processed by computer programs, not to be displayed to humans </li></ul><ul><li>Nevertheless, almost all current Web browsers can display XML documents </li></ul><ul><ul><li>They do not all display it the same way </li></ul></ul><ul><ul><li>They may not display it at all if it has errors </li></ul></ul><ul><li>This is just an added value. Remember: HTML is designed to be viewed, XML is designed to be used </li></ul>
  13. 13. XML Parsers
  14. 14. Stream Model <ul><li>Stream seen by parser is a sequence of elements </li></ul><ul><li>As each XML element is seen, an event occurs </li></ul><ul><ul><li>Some code registered with the parser (the event handler) is executed </li></ul></ul><ul><li>This approach is popularized by the Simple API for XML (SAX) </li></ul><ul><li>Problem: </li></ul><ul><ul><li>Hard to get a global view of the document </li></ul></ul><ul><ul><li>Parsing state represented by global variables set by the event handlers </li></ul></ul>
  15. 15. Data Model <ul><li>The XML data is transformed into a navigable data structure in memory </li></ul><ul><ul><li>Because of the nesting of XML elements, a tree data structure is used </li></ul></ul><ul><ul><li>The tree is navigated to discover the XML document </li></ul></ul><ul><li>This approach is popularized by the Document Object Model (DOM) </li></ul><ul><li>Problem: </li></ul><ul><ul><li>May require large amounts of memory </li></ul></ul><ul><ul><li>May not be as fast as stream approach </li></ul></ul><ul><ul><ul><li>Some DOM parsers use SAX to build the tree </li></ul></ul></ul>
  16. 16. SAX and DOM <ul><li>SAX and DOM are standards for XML parsers </li></ul><ul><ul><li>DOM is a W3C standard </li></ul></ul><ul><ul><li>SAX is an ad-hoc (but very popular) standard </li></ul></ul><ul><li>There are various implementations available </li></ul><ul><li>Java implementations are provided as part of JAXP ( Java API for XML Processing ) </li></ul><ul><li>JAXP package is included in JDK starting from JDK 1.4 </li></ul><ul><ul><li>Is available separately for Java 1.3 </li></ul></ul>
  17. 17. Difference between SAX and DOM <ul><li>DOM reads the entire document into memory and stores it as a tree data structure </li></ul><ul><li>SAX reads the document and calls handler methods for each element or block of text that it encounters </li></ul><ul><li>Consequences: </li></ul><ul><ul><li>DOM provides &quot;random access&quot; into the document </li></ul></ul><ul><ul><li>SAX provides only sequential access to the document </li></ul></ul><ul><ul><li>DOM is slow and requires huge amount of memory, so it cannot be used for large documents </li></ul></ul><ul><ul><li>SAX is fast and requires very little memory, so it can be used for huge documents </li></ul></ul><ul><ul><ul><li>This makes SAX much more popular for web sites </li></ul></ul></ul>
  18. 18. SAX Parsing
  19. 19. Parsing with SAX <ul><li>SAX uses the source-listener-delegate model for parsing XML documents </li></ul><ul><ul><li>Source is XML data consisting of a XML elements </li></ul></ul><ul><ul><li>A listener written in Java is attached to the document which listens for an event </li></ul></ul><ul><ul><li>When event is thrown, some method is delegated for handling the code </li></ul></ul>
  20. 20. SAX Parsing: process XML as Stream
  21. 21. Simple SAX program <ul><li>The program consists of two classes: </li></ul><ul><ul><li>Sample -- This class contains the main method; it </li></ul></ul><ul><ul><ul><li>Gets a factory to make parsers </li></ul></ul></ul><ul><ul><ul><li>Gets a parser from the factory </li></ul></ul></ul><ul><ul><ul><li>Creates a Handler object to handle callbacks from the parser </li></ul></ul></ul><ul><ul><ul><li>Tells the parser which handler to send its callbacks to </li></ul></ul></ul><ul><ul><ul><li>Reads and parses the input XML file </li></ul></ul></ul><ul><ul><li>Handler -- This class contains handlers for three kinds of callbacks: </li></ul></ul><ul><ul><ul><li>startElement callbacks, generated when a start tag is seen </li></ul></ul></ul><ul><ul><ul><li>endElement callbacks, generated when an end tag is seen </li></ul></ul></ul><ul><ul><ul><li>characters callbacks, generated for the contents of an element </li></ul></ul></ul>
  22. 22. The Sample class <ul><li>import javax.xml.parsers.*; // for both SAX and DOM import org.xml.sax.*; import org.xml.sax.helpers.*; </li></ul><ul><li>// For simplicity, we let the operating system handle exceptions // In &quot;real life&quot; this is poor programming practice public class Sample { public static void main(String args[]) throws Exception { </li></ul><ul><li>// Create a parser factory SAXParserFactory factory = SAXParserFactory.newInstance(); </li></ul><ul><li>// Tell factory that the parser must understand namespaces factory.setNamespaceAware(true); </li></ul><ul><li>// Make the parser SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader(); </li></ul>
  23. 23. The Sample class <ul><li> // Create a handler Handler handler = new Handler(); </li></ul><ul><ul><li>// Tell the parser to use this handler parser.setContentHandler(handler); </li></ul></ul><ul><ul><li>// Finally, read and parse the document parser.parse(&quot;hello.xml&quot;); </li></ul></ul><ul><ul><li>} // end of Sample class </li></ul></ul><ul><li>The parser reads the file hello.xml </li></ul><ul><li>It should be located </li></ul><ul><ul><li>In the same directory </li></ul></ul><ul><ul><li>In a directory that is included in the classpath </li></ul></ul>
  24. 24. The Handler class <ul><li>public class Handler extends DefaultHandler { </li></ul><ul><ul><li>DefaultHandler is an adapter class that defines empty methods to be overridden </li></ul></ul><ul><li>We define 3 methods to handle (1) start tags, (2) contents, and (3) end tags. </li></ul><ul><ul><li>The methods will just print a line </li></ul></ul><ul><ul><li>Each of these 3 methods throws a SAXException </li></ul></ul><ul><li>// SAX calls this when it encounters a start tag public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes attributes) throws SAXException { System.out.println(&quot;startElement: &quot; + qualifiedName); } </li></ul>
  25. 25. The Handler class <ul><li>// SAX calls this method to pass in character data public void characters(char ch[ ], int start, int length) throws SAXException { System.out.println(&quot;characters: &quot;&quot; + new String(ch, start, length) + &quot;&quot;&quot;); } </li></ul><ul><li>// SAX call this method when it encounters an end tag public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException { System.out.println(&quot;Element: /&quot; + qualifiedName); } } // End of Handler class </li></ul>
  26. 26. Results <ul><li>If the file hello.xml contains: <?xml version=&quot;1.0&quot;?> <display>Hello World!</display> </li></ul><ul><li>Then the output from running java Sample will be: startElement: display characters: &quot;Hello World!&quot; Element: /display </li></ul>
  27. 27. More results <ul><li>Now suppose the file hello.xml contains : </li></ul><ul><ul><li><?xml version=&quot;1.0&quot;?> <display> <i>Hello</i> World! </display> </li></ul></ul><ul><li>Notice that the root element, <display> , contains a nested element <i> and whitespace (including newlines) </li></ul><ul><li>The result will be as shown at the right: </li></ul><ul><li>startElement: display characters: &quot;&quot; characters: &quot; &quot; characters: &quot; &quot; startElement: i characters: &quot;Hello&quot; endElement: /i characters: &quot;World!&quot; characters: &quot; &quot; endElement: /display </li></ul>// empty string // newline // spaces // another newline
  28. 28. Factories <ul><li>SAX uses a parser factory </li></ul><ul><ul><li>A factory is a design pattern alternative to constructors </li></ul></ul><ul><li>Factories allow the programmer to: </li></ul><ul><ul><li>Decide whether or not to create a new object </li></ul></ul><ul><ul><li>Decide what kind of object to create </li></ul></ul><ul><ul><li>class TrustMe { private TrustMe() { } // private constructor public TrustMe makeTrust() { // factory method if ( /* test of some sort */) return new TrustMe(); } } } </li></ul></ul>
  29. 29. Parser factories <ul><li>To create a SAX parser factory, call static method: SAXParserFactory.newInstance() </li></ul><ul><ul><li>Returns an object of type SAXParserFactory </li></ul></ul><ul><ul><li>It may throw a FactoryConfigurationError </li></ul></ul><ul><li>Then, the parser can be customized: </li></ul><ul><ul><li>public void setNamespaceAware(boolean awareness) </li></ul></ul><ul><ul><ul><li>Call this with true if you are using namespaces </li></ul></ul></ul><ul><ul><ul><li>The default (if you don’t call this method) is false </li></ul></ul></ul><ul><ul><li>public void setValidating(boolean validating) </li></ul></ul><ul><ul><ul><li>Call this with true if you want to validate against a DTD </li></ul></ul></ul><ul><ul><ul><li>The default (if you don’t call this method) is false </li></ul></ul></ul><ul><ul><ul><li>Validation will give an error if you do not have a DTD </li></ul></ul></ul>
  30. 30. Getting a parser <ul><li>Once a SAXParserFactory factory was set up, parsers can be created with: SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader(); </li></ul><ul><li>Note: SAXParser is not thread-safe </li></ul><ul><li>If a parser will be used by in multiple threads, create a separate SAXParser object for each thread </li></ul>
  31. 31. Declaring which handler to use <ul><li>Since the SAX parser will call the handlers, we need to supply these methods </li></ul><ul><li>Binding the parser with a handler: Handler handler = new Handler(); parser.setContentHandler(handler); </li></ul><ul><li>These statements could be combined: parser.setContentHandler(new Handler()); </li></ul><ul><li>Finally, the parser is invoked on the file to parse: parser.parse(&quot;hello.xml&quot;); </li></ul><ul><li>Everything else is done in the handler methods </li></ul>
  32. 32. SAX handlers <ul><li>A callback handler must implement 4 interfaces: </li></ul><ul><ul><li>interface ContentHandler </li></ul></ul><ul><ul><ul><li>Handles basic parsing callbacks, e.g., element starts and ends </li></ul></ul></ul><ul><ul><li>interface DTDHandler </li></ul></ul><ul><ul><ul><li>Handles only notation and unparsed entity declarations </li></ul></ul></ul><ul><ul><li>interface EntityResolver </li></ul></ul><ul><ul><ul><li>Does customized handling for external entities </li></ul></ul></ul><ul><ul><li>interface ErrorHandler </li></ul></ul><ul><ul><ul><li>Must be implemented or parsing errors will be ignored! </li></ul></ul></ul><ul><li>Implementing all these interfaces is a lot of work </li></ul><ul><ul><li>It is easier to use an adapter class </li></ul></ul>
  33. 33. Class DefaultHandler <ul><li>DefaultHandler is in an adapter from package org.xml.sax.helpers </li></ul><ul><li>DefaultHandler implements ContentHandler , DTDHandler , EntityResolver , and ErrorHandler </li></ul><ul><li>DefaultHandler provides empty methods for every method declared in each of the interfaces </li></ul><ul><li>To use this class, extend it and override the methods that are important to the application </li></ul>
  34. 34. ContentHandler methods <ul><li>public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException </li></ul><ul><li>This method is called at the beginning of elements </li></ul><ul><li>When SAX calls startElement , it passes in a parameter of type Attributes </li></ul><ul><li>The following methods look up attributes by name rather than by index: </li></ul><ul><ul><li>public int getIndex(String qualifiedName) </li></ul></ul><ul><ul><li>public int getIndex(String uri, String localName) </li></ul></ul><ul><ul><li>public String getValue(String qualifiedName) </li></ul></ul><ul><ul><li>public String getValue(String uri, String localName) </li></ul></ul>
  35. 35. ContentHandler methods <ul><li>endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException </li></ul><ul><li>The parameters to endElement are the same as those to startElement , except that the Attributes parameter is omitted </li></ul><ul><li>public void characters(char[] ch, int start, int length) throws SAXException </li></ul><ul><li>ch is an array of characters </li></ul><ul><ul><li>Only length characters, starting from ch[start] , are the contents of the element </li></ul></ul>
  36. 36. Error Handling <ul><li>SAX error handling is unusual </li></ul><ul><li>Most errors are ignored unless you an error handler org.xml.sax.ErrorHandler is registered </li></ul><ul><ul><li>Ignored errors can cause unexpected behavior </li></ul></ul><ul><li>The ErrorHandler interface declares: </li></ul><ul><ul><li>public void fatalError (SAXParseException exception) throws SAXException // XML not well structured </li></ul></ul><ul><ul><li>public void error (SAXParseException exception) throws SAXException // XML validation error </li></ul></ul><ul><ul><li>public void warning (SAXParseException exception) throws SAXException // minor problem </li></ul></ul>
  37. 37. External parsers <ul><li>Alternatively, you can use an existing parser: </li></ul><ul><ul><li>Xerces, Electric XML, Expat, MSXML, CMarkup </li></ul></ul><ul><li>Stages of the parsing </li></ul><ul><ul><li>Get the URL object for the source </li></ul></ul><ul><ul><li>Create InputSource object encapsulating the data source </li></ul></ul><ul><ul><li>Create the parser </li></ul></ul><ul><ul><li>Launch the parser on the data source </li></ul></ul>
  38. 38. Problems with SAX <ul><li>SAX provides only sequential access to the document being processed </li></ul><ul><li>SAX has only a local view of the current element being processed </li></ul><ul><ul><li>Global knowledge of parsing must be stored in global variables </li></ul></ul><ul><ul><li>A single startElement() method for all elements </li></ul></ul><ul><ul><ul><li>In startElement() there are many “if-then-else” tests for checking a specific element </li></ul></ul></ul><ul><ul><ul><li>When an element is seen, a global flag is set </li></ul></ul></ul><ul><ul><ul><li>When finished with the element global flag must be set to false </li></ul></ul></ul>
  39. 39. DOM Parsing
  40. 40. DOM <ul><li>DOM represents the XML document as a tree </li></ul><ul><ul><li>Hierarchical nature of tree maps well to hierarchical nesting of XML elements </li></ul></ul><ul><ul><li>Tree contains a global view of the document </li></ul></ul><ul><ul><ul><li>Makes navigation of document easy </li></ul></ul></ul><ul><ul><ul><li>Allows to modify any subtree </li></ul></ul></ul><ul><ul><ul><li>Easier processing than SAX but memory intensive! </li></ul></ul></ul><ul><li>As well as SAX, DOM is an API only </li></ul><ul><ul><li>Does not specify a parser </li></ul></ul><ul><ul><li>Lists the API and requirements for the parser </li></ul></ul><ul><li>DOM parsers typically use SAX parsing </li></ul>
  41. 41. DOM Parsing: process entire document
  42. 42. Simple DOM program <ul><li>First we need to create a DOM parser, called a DocumentBuilder </li></ul><ul><li>The parser is created, not by a constructor, but by calling a static factory method </li></ul><ul><li>DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); </li></ul><ul><li>DocumentBuilder builder = factory.newDocumentBuilder(); </li></ul>
  43. 43. Simple DOM program <ul><li>An XML file hello.xml will be be parsed <?xml version=&quot;1.0&quot;?> <display>Hello World!</display> </li></ul><ul><li>To read this file, we add the following line : Document document = builder.parse(&quot;hello.xml&quot;); </li></ul><ul><li>document contains the entire XML file as a tree </li></ul><ul><li>The following code finds the content of the root element and prints it </li></ul><ul><li> Element root = document.getDocumentElement(); Node textNode = root.getFirstChild(); System.out.println(textNode.getNodeValue()); </li></ul><ul><li>The output of the program is: Hello World! </li></ul>
  44. 44. Reading in the tree <ul><li>The parse method reads in the entire XML document and represents it as a tree in memory </li></ul><ul><ul><li>For a large document, parsing could take a while </li></ul></ul><ul><ul><li>If you want to interact with your program while it is parsing, you need to use parser in a separate thread </li></ul></ul><ul><li>Practically, an XML parse tree may require up to 10 times memory as the original XML document </li></ul><ul><ul><li>If you have a lot of tree manipulation to do, DOM is much more convenient than SAX </li></ul></ul><ul><ul><li>If you do not have a lot of tree manipulation to do, consider using SAX instead </li></ul></ul>
  45. 45. Structure of the DOM tree <ul><li>The DOM tree is composed of Node objects </li></ul><ul><li>Node is an interface </li></ul><ul><ul><li>Some of the more important sub-interfaces are Element , Attr , and Text </li></ul></ul><ul><ul><ul><li>An Element node may have children </li></ul></ul></ul><ul><ul><ul><li>Attr and Text nodes are the leaves of the tree </li></ul></ul></ul><ul><li>Hence, the DOM tree is composed of Node objects </li></ul><ul><ul><li>Node objects can be downcast into specific types if needed </li></ul></ul>
  46. 46. Operations on Node s <ul><li>The results returned by getNodeName() , getNodeValue() , getNodeType() and getAttributes() depend on the subtype of the node, as follows: Element Text Attr getNodeName() getNodeValue() getNodeType() getAttributes() </li></ul>tag name null ELEMENT_NODE NamedNodeMap &quot;#text&quot; text contents TEXT_NODE null name of attribute value of attribute ATTRIBUTE_NODE null
  47. 47. Distinguishing Node types <ul><li>An easy way to handle different types of nodes: </li></ul><ul><ul><li>switch(node.getNodeType()) { </li></ul></ul><ul><ul><ul><li>case Node.ELEMENT_NODE: </li></ul></ul></ul><ul><ul><ul><ul><li>Element element = (Element)node; ...; break; </li></ul></ul></ul></ul><ul><ul><ul><li>case Node.TEXT_NODE: </li></ul></ul></ul><ul><ul><ul><ul><li>Text text = (Text)node; ... break; </li></ul></ul></ul></ul><ul><ul><ul><li>case Node.ATTRIBUTE_NODE: </li></ul></ul></ul><ul><ul><ul><ul><li>Attr attr = (Attr)node; ... break; </li></ul></ul></ul></ul><ul><ul><ul><li>default: ... </li></ul></ul></ul><ul><ul><li>} </li></ul></ul>
  48. 48. Operations on Node s <ul><li>Tree-walking methods that return a Node : </li></ul><ul><ul><li>getParentNode() </li></ul></ul><ul><ul><li>getFirstChild() </li></ul></ul><ul><ul><li>getNextSibling() </li></ul></ul><ul><ul><li>getPreviousSibling() </li></ul></ul><ul><ul><li>getLastChild() </li></ul></ul><ul><li>Test methods that return a boolean : </li></ul><ul><ul><li>hasAttributes() </li></ul></ul><ul><ul><li>hasChildNodes() </li></ul></ul>
  49. 49. Operations for Element s <ul><li>String getTagName() </li></ul><ul><ul><li>Returns the name of the tag </li></ul></ul><ul><li>boolean hasAttribute(String name) </li></ul><ul><ul><li>Returns true if this Element has the named attribute </li></ul></ul><ul><li>String getAttribute(String name) </li></ul><ul><ul><li>Returns the value of the named attribute </li></ul></ul><ul><li>boolean hasAttributes() </li></ul><ul><ul><li>Returns true if this Element has any attributes </li></ul></ul><ul><li>NamedNodeMap getAttributes() </li></ul><ul><ul><li>Returns a NamedNodeMap of all the Element’s attributes </li></ul></ul>
  50. 50. Operations on Text s <ul><li>Text is a subinterface of CharacterData and inherits the following operations (among others): </li></ul><ul><ul><li>public String getData() throws DOMException </li></ul></ul><ul><ul><ul><li>Returns the text contents of this Text node </li></ul></ul></ul><ul><ul><li>public int getLength() </li></ul></ul><ul><ul><ul><li>Returns the number of Unicode characters in the text </li></ul></ul></ul><ul><ul><li>public String substringData(int offset, int count) throws DOMException </li></ul></ul><ul><ul><ul><li>Returns a substring of the text contents </li></ul></ul></ul>
  51. 51. Operations on Attribute s <ul><li>String getName() </li></ul><ul><ul><li>Returns the name of this attribute. </li></ul></ul><ul><li>Element getOwnerElement() </li></ul><ul><ul><li>Returns the Element node this attribute is attached to </li></ul></ul><ul><li>String getValue() </li></ul><ul><ul><li>Returns the value of the attribute as a String </li></ul></ul>
  52. 52. Overview <ul><li>DOM, unlike SAX, gives allows to create and modify XML trees </li></ul><ul><li>There are three basic kinds of operations: </li></ul><ul><ul><li>Creating a new DOM </li></ul></ul><ul><ul><li>Modifying the structure of a DOM </li></ul></ul><ul><ul><li>Modifying the content of a DOM </li></ul></ul><ul><li>Creating a new DOM requires a few extra methods just to get started </li></ul><ul><ul><li>Afterwards, you can add elements through modifying its structure and contents </li></ul></ul>
  53. 53. Creating a new DOM import javax.xml.parsers.*; import org.w3c.dom.Document; … try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.newDocument(); } catch (ParserConfigurationException e) { ... }
  54. 54. Creating structure <ul><li>The following are instance methods of Document : </li></ul><ul><ul><li>public Element createElement(String tagName) </li></ul></ul><ul><ul><li>public Element createElementNS(String namespaceURI, String qualifiedName) </li></ul></ul><ul><ul><li>public Attr createAttribute(String name) </li></ul></ul><ul><ul><li>public Attr createAttributeNS(String namespaceURI, String qualifiedName) </li></ul></ul><ul><ul><li>public ProcessingInstruction createProcessingInstruction (String target, String data) </li></ul></ul><ul><ul><li>public EntityReference createEntityReference(String name) </li></ul></ul><ul><ul><li>public Text createTextNode(String data) </li></ul></ul><ul><ul><li>public Comment createComment(String data) </li></ul></ul>
  55. 55. Methods of Node <ul><li>public Node appendChild(Node newChild) </li></ul><ul><li>public Node insertBefore(Node newChild, Node refChild) </li></ul><ul><li>public Node removeChild(Node oldChild) </li></ul><ul><li>public Node replaceChild(Node newChild, Node oldChild) </li></ul><ul><li>setNodeValue(String nodeValue) </li></ul><ul><ul><li>Functionality depends on the type of the node </li></ul></ul>
  56. 56. Methods of Element <ul><li>public void setAttribute(String name, String value) </li></ul><ul><li>public Attr setAttributeNode(Attr newAttr) </li></ul><ul><li>public void setAttributeNodeNS(String namespaceURI, String qualifiedName, String value) </li></ul><ul><li>public Attr setAttributeNodeNS(Attr newAttr) </li></ul><ul><li>public void removeAttribute(String name) </li></ul><ul><li>public void removeAttributeNS(String namespaceURI, String localName) </li></ul><ul><li>public Attr removeAttributeNode(Attr oldAttr) </li></ul>
  57. 57. Method of Attribute <ul><li>public void setValue(String value) </li></ul><ul><li>This is the only method that modifies an Attribute </li></ul><ul><ul><li>The rest just retrieve information </li></ul></ul>
  58. 58. Queries ?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×