Simple API for XML

2,117 views
2,032 views

Published on

Simple API for XML

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,117
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
87
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Simple API for XML

    1. 1. Simple API for XML (SAX) XML http://yht4ever.blogspot.com [email_address] B070066 - NIIT Quang Trung 08/2007
    2. 2. Contents Events SAX-based Parsers DOM vs. SAX Introduction
    3. 3. Introduction <ul><li>Simple API for XML </li></ul><ul><li>Another method for accessing XML document’s contents </li></ul><ul><li>Developed by XML-DEV mailing-list members </li></ul><ul><li>Uses event-based model </li></ul><ul><li>Notifications (events) are raised as document is parsed </li></ul><ul><li>Originally designed as Java API </li></ul><ul><ul><li>&quot;Others (C++, Python, Perl) are now supported </li></ul></ul>
    4. 4. DOM vs. SAX <ul><li>DOM </li></ul><ul><ul><li>Tree-based model </li></ul></ul><ul><ul><ul><li>Stores document data in node hierarchy </li></ul></ul></ul><ul><ul><li>Data is accessed quickly </li></ul></ul><ul><ul><li>Provides facilities for adding and removing nodes </li></ul></ul><ul><li>SAX </li></ul><ul><ul><li>Invoke methods when markup (specific tag) is encountered </li></ul></ul><ul><ul><li>Greater performance than DOM </li></ul></ul><ul><ul><li>Less memory overhead than DOM </li></ul></ul><ul><ul><li>Typically used for reading documents (not modifying them) </li></ul></ul><ul><li>(see more on slide note) </li></ul>
    5. 5. SAX-based Parsers <ul><li>Available for variety of programming languages </li></ul><ul><ul><li>e.g., Java, Python, etc. </li></ul></ul><ul><li>Some SAX-based parsers. </li></ul>
    6. 6. SAX-based Parsers <ul><li>SAX parser </li></ul><ul><ul><li>Invokes certain methods when events occur </li></ul></ul><ul><ul><ul><li>Programmers override these methods to process data </li></ul></ul></ul>
    7. 7. Example: Tree Diagram 1 // Fig. 9.3 : Tree.java 2 // Using the SAX Parser to generate a tree diagram. 3 4 import java.io.*; 5 import org.xml.sax.*; // for HandlerBase class 6 import javax.xml.parsers.SAXParserFactory; 7 import javax.xml.parsers.ParserConfigurationException; 8 import javax.xml.parsers.SAXParser; 9 10 public class Tree extends HandlerBase { 11 private int indent = 0; // indentation counter 12 13 // returns the spaces needed for indenting 14 private String spacer( int count ) 15 { 16 String temp = &quot;&quot;; 17 18 for ( int i = 0; i < count; i++ ) 19 temp += &quot; &quot;; 20 21 return temp; 22 } 23 24 // method called before parsing 25 // it provides the document location 26 public void setDocumentLocator( Locator loc ) 27 { 28 System.out.println( &quot;URL: &quot; + loc.getSystemId() ); 29 } 30 1 // Fig. 9.3 : Tree.java 2 // Using the SAX Parser to generate a tree diagram. 3 4 import java.io.*; 5 import org.xml.sax.*; // for HandlerBase class 6 import javax.xml.parsers.SAXParserFactory; 7 import javax.xml.parsers.ParserConfigurationException; 8 import javax.xml.parsers.SAXParser; 9 10 public class Tree extends HandlerBase { 11 private int indent = 0; // indentation counter 12 13 // returns the spaces needed for indenting 14 private String spacer( int count ) 15 { 16 String temp = &quot;&quot;; 17 18 for ( int i = 0; i < count; i++ ) 19 temp += &quot; &quot;; 20 21 return temp; 22 } 23 24 // method called before parsing 25 // it provides the document location 26 public void setDocumentLocator( Locator loc ) 27 { 28 System.out.println( &quot;URL: &quot; + loc.getSystemId() ); 29 } 30 import specifies location of classes needed by application Assists in formatting Override method to output parsed document’s URL
    8. 8. 31 // method called at the beginning of a document 32 public void startDocument() throws SAXException 33 { 34 System.out.println( &quot;[ document root ]&quot; ); 35 } 36 37 // method called at the end of the document 38 public void endDocument() throws SAXException 39 { 40 System.out.println( &quot;[ document end ]&quot; ); 41 } 42 43 // method called at the start tag of an element 44 public void startElement( String name, 45 AttributeList attributes ) throws SAXException 46 { 47 System.out.println( spacer( indent++ ) + 48 &quot;+-[ element : &quot; + name + &quot; ]&quot;); 49 50 if ( attributes != null ) 51 52 for ( int i = 0; i < attributes.getLength(); i++ ) 53 System.out.println( spacer( indent ) + 54 &quot;+-[ attribute : &quot; + attributes.getName( i ) + 55 &quot; ] &quot;&quot; + attributes.getValue( i ) + &quot;&quot;&quot; ); 56 } 57 Overridden method called when root node encountered Overridden method called when end of document is encountered Overridden method called when start tag is encountered Output each attribute’s name and value (if any)
    9. 9. 58 // method called at the end tag of an element 59 public void endElement( String name ) throws SAXException 60 { 61 indent--; 62 } 63 64 // method called when a processing instruction is found 65 public void processingInstruction( String target, 66 String value ) throws SAXException 67 { 68 System.out.println( spacer( indent ) + 69 &quot;+-[ proc-inst : &quot; + target + &quot; ] &quot;&quot; + value + &quot;&quot;&quot; ); 70 } 71 72 // method called when characters are found 73 public void characters( char buffer[], int offset, 74 int length ) throws SAXException 75 { 76 if ( length > 0 ) { 77 String temp = new String( buffer, offset, length ); 78 79 System.out.println( spacer( indent ) + 80 &quot;+-[ text ] &quot;&quot; + temp + &quot;&quot;&quot; ); 81 } 82 } 83 Overridden method called when end of element is encountered Overridden method called when processing instruction is encountered Overridden method called when character data is encountered
    10. 10. 84 // method called when ignorable whitespace is found 85 public void ignorableWhitespace( char buffer[], 86 int offset, int length ) 87 { 88 if ( length > 0 ) { 89 System.out.println( spacer( indent ) + &quot;+-[ ignorable ]&quot; ); 90 } 91 } 92 93 // method called on a non-fatal (validation) error 94 public void error( SAXParseException spe ) 95 throws SAXParseException 96 { 97 // treat non-fatal errors as fatal errors 98 throw spe; 99 } 100 101 // method called on a parsing warning 102 public void warning( SAXParseException spe ) 103 throws SAXParseException 104 { 105 System.err.println( &quot;Warning: &quot; + spe.getMessage() ); 106 } 107 108 // main method 109 public static void main( String args[] ) 110 { 111 boolean validate = false ; 112 Overridden method called when ignorable whitespace is encountered Overridden method called when error (usually validation) occurs Overridden method called when problem is detected (but not considered error) Method main starts application
    11. 11. 113 if ( args.length != 2 ) { 114 System.err.println( &quot;Usage: java Tree [validate] &quot; + 115 &quot;[filename] &quot; ); 116 System.err.println( &quot;Options:&quot; ); 117 System.err.println( &quot; validate [yes|no] : &quot; + 118 &quot;DTD validation&quot; ); 119 System.exit( 1 ); 120 } 121 122 if ( args[ 0 ].equals( &quot;yes&quot; ) ) 123 validate = true ; 124 125 SAXParserFactory saxFactory = 126 SAXParserFactory.newInstance(); 127 128 saxFactory.setValidating( validate ); 129 Allow command-line arguments (if we want to validate document) SAXParserFactory can instantiate SAX-based parser
    12. 12. 130 try { 131 SAXParser saxParser = saxFactory.newSAXParser(); 132 saxParser.parse( new File( args[ 1 ] ), new Tree() ); 133 } 134 catch ( SAXParseException spe ) { 135 System.err.println( &quot;Parse Error: &quot; + spe.getMessage() ); 136 } 137 catch ( SAXException se ) { 138 se.printStackTrace(); 139 } 140 catch ( ParserConfigurationException pce ) { 141 pce.printStackTrace(); 142 } 143 catch ( IOException ioe ) { 144 ioe.printStackTrace(); 145 } 146 147 System.exit( 0 ); 148 } 149 } Instantiate SAX-based parser Handles errors (if any)
    13. 13. URL: file:C:/Tree/spacing1.xml [ document root ] +-[ element : test ] +-[ attribute : name ] &quot; spacing 1 &quot; +-[ text ] &quot; &quot; +-[ text ] &quot; &quot; +-[ element : example ] +-[ element : object ] +-[ text ] &quot;World&quot; +-[ text ] &quot; &quot; [ document end ] 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.4 : spacing1.xml --> 4 <!-- Whitespaces in nonvalidating parsing --> 5 <!-- XML document without DTD --> 6 7 <test name = &quot; spacing 1 &quot; > 8 <example><object> World </object></example> 9 </test> Root element test contains attribute name with value “ spacing 1 ” XML document with elements test , example and object XML document does not reference DTD Note that whitespace is preserved: attribute value (line 7), line feed (end of line 7), indentation (line 8) and line feed (end of line 8)
    14. 14. URL: file:C:/Tree/spacing2.xml [ document root ] +-[ element : test ] +-[ attribute : name ] &quot; spacing 2 &quot; +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : object ] +-[ text ] &quot;World&quot; +-[ ignorable ] [ document end ] 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.5 : spacing2.xml --> 4 <!-- Whitespace and nonvalidated parsing --> 5 <!-- XML document with DTD --> 6 7 <!DOCTYPE test [ 8 <!ELEMENT test (example) > 9 <!ATTLIST test name CDATA #IMPLIED> 10 <!ELEMENT element (object*) > 11 <!ELEMENT object ( #PCDATA ) > 12 ]> 13 14 <test name = &quot; spacing 2 &quot; > 15 <example><object> World </object></example> 16 </test> DTD checks document’s characters, so any “removable” whitespace is ignorable Line feed at line 14, spaces at beginning of line 15 and line feed at line 15 are ignored
    15. 15. URL: file:C:/Tree/notvalid.xml [ document root ] +-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] &quot;message&quot; +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : item ] +-[ text ] &quot;Hello & Welcome!&quot; +-[ ignorable ] [ document end ] 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.6 : notvalid.xml --> 4 <!-- Validation and non-validation --> 5 6 <!DOCTYPE test [ 7 <!ELEMENT test (example) > 8 <!ELEMENT example ( #PCDATA ) > 9 ]> 10 11 <test> 12 <?test message?> 13 <example><item><![CDATA[ Hello & Welcome! ]]></item></example> 14 </test> Invalid document because element example cannot contain element item Validation disabled, so document parses successfully Parser does not process text in CDATA section and returns character data
    16. 16. URL: file:C:/Tree/notvalid.xml [ document root ] +-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] &quot;message&quot; +-[ ignorable ] +-[ ignorable ] +-[ element : example ] Parse Error: Element &quot;example&quot; does not allow &quot;item&quot; Parsing terminates when fatal error occurs at element item Validation enabled
    17. 17. URL: file:C:/Tree/valid.xml [ document root ] +-[ element : test ] +-[ text ] &quot; &quot; +-[ text ] &quot; &quot; +-[ element : example ] +-[ text ] &quot;Hello &quot; +-[ text ] &quot;&&quot; +-[ text ] &quot; Welcome!&quot; +-[ text ] &quot; &quot; [ document end ] URL: file:C:/Tree/valid.xml [ document root ] Warning: Valid documents must have a <!DOCTYPE declaration. Parse Error: Element type &quot;test&quot; is not declared. 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.7 : valid.xml --> 4 <!-- DTD-less document --> 5 6 <test> 7 <example> Hello &amp; Welcome! </example> 8 </test> Validation disabled in first output, so document parses successfully Validation enabled in second output, and parsing fails because DTD does not exist
    18. 18. To be continued… <ul><li>To be continued… </li></ul>
    19. 19. Reference <ul><li>XML How to program </li></ul><ul><li>Sang Sin Presentation (sang.sin@sun.com) </li></ul>
    20. 20. Q&A <ul><li>Feel free to post questions at http://yht4ever.blogspot.com </li></ul><ul><li>or email to: [email_address] or [email_address] </li></ul>
    21. 21. http://yht4ever.blogspot.com Thank You !

    ×