Simple API for XML (SAX) XML http://yht4ever.blogspot.com [email_address] B070066 - NIIT Quang Trung 08/2007
Contents Events SAX-based Parsers DOM vs. SAX Introduction
Introduction Simple API for XML Another method for accessing XML document’s contents Developed by XML-DEV mailing-list members Uses event-based model Notifications (events) are raised as document is parsed Originally designed as Java API  "Others (C++, Python, Perl) are now supported
DOM vs. SAX DOM Tree-based model Stores document data in node hierarchy Data is accessed quickly Provides facilities for adding and removing nodes SAX Invoke methods when markup (specific tag) is encountered Greater performance than DOM Less memory overhead than DOM Typically used for reading documents (not modifying them) (see more on slide note)
SAX-based Parsers Available for variety of programming languages e.g., Java, Python, etc. Some SAX-based parsers.
SAX-based Parsers SAX parser Invokes certain methods when events occur Programmers  override  these methods to process data
Example: Tree Diagram 1 // Fig. 9.3 : Tree.java 2 // Using the SAX Parser to generate a tree diagram. 3 4 import  java.io.*; 5 import  org.xml.sax.*;  // for HandlerBase class 6 import  javax.xml.parsers.SAXParserFactory; 7 import  javax.xml.parsers.ParserConfigurationException; 8 import  javax.xml.parsers.SAXParser; 9 10 public class  Tree  extends  HandlerBase { 11   private int  indent = 0;  // indentation counter 12 13   // returns the spaces needed for indenting 14   private  String spacer(  int  count ) 15   { 16   String temp = &quot;&quot;; 17 18   for  (  int  i = 0; i < count; i++ ) 19   temp += &quot;  &quot;; 20 21   return  temp; 22   } 23 24   // method called before parsing 25   // it provides the document location 26   public void  setDocumentLocator( Locator loc ) 27   { 28   System.out.println( &quot;URL: &quot; + loc.getSystemId() ); 29   } 30 1 // Fig. 9.3 : Tree.java 2 // Using the SAX Parser to generate a tree diagram. 3 4 import  java.io.*; 5 import  org.xml.sax.*;  // for HandlerBase class 6 import  javax.xml.parsers.SAXParserFactory; 7 import  javax.xml.parsers.ParserConfigurationException; 8 import  javax.xml.parsers.SAXParser; 9 10 public class  Tree  extends  HandlerBase { 11   private int  indent = 0;  // indentation counter 12 13   // returns the spaces needed for indenting 14   private  String spacer(  int  count ) 15   { 16   String temp = &quot;&quot;; 17 18   for  (  int  i = 0; i < count; i++ ) 19   temp += &quot;  &quot;; 20 21   return  temp; 22   } 23 24   // method called before parsing 25   // it provides the document location 26   public void  setDocumentLocator( Locator loc ) 27   { 28   System.out.println( &quot;URL: &quot; + loc.getSystemId() ); 29   } 30 import  specifies location of classes needed by application Assists in formatting Override method to output parsed document’s URL
31   // method called at the beginning of a document 32   public void  startDocument()  throws  SAXException 33   { 34   System.out.println( &quot;[ document root ]&quot; ); 35   } 36 37   // method called at the end of the document 38   public void  endDocument()  throws  SAXException 39   { 40   System.out.println( &quot;[ document end ]&quot; ); 41   } 42 43   // method called at the start tag of an element 44   public void  startElement( String name, 45   AttributeList attributes )  throws  SAXException 46   { 47   System.out.println( spacer( indent++ ) + 48   &quot;+-[ element : &quot; + name + &quot; ]&quot;); 49 50   if  ( attributes !=  null  ) 51 52   for  (  int  i = 0; i < attributes.getLength(); i++ ) 53   System.out.println( spacer( indent ) + 54   &quot;+-[ attribute : &quot; + attributes.getName( i ) + 55   &quot; ] \&quot;&quot; + attributes.getValue( i ) + &quot;\&quot;&quot; ); 56   } 57 Overridden method called when root node encountered Overridden method called when end of document is encountered Overridden method called when start tag is encountered Output each attribute’s name and value (if any)
58   // method called at the end tag of an element 59   public void  endElement( String name )  throws  SAXException 60   { 61   indent--; 62   } 63 64   // method called when a processing instruction is found 65   public void  processingInstruction( String target, 66   String value )  throws  SAXException 67   { 68   System.out.println( spacer( indent ) + 69   &quot;+-[ proc-inst : &quot; + target + &quot; ] \&quot;&quot; + value + &quot;\&quot;&quot; ); 70   } 71 72   // method called when characters are found 73   public void  characters(  char  buffer[],  int  offset, 74   int  length )  throws  SAXException 75   { 76   if  ( length > 0 ) { 77   String temp =  new  String( buffer, offset, length ); 78 79   System.out.println( spacer( indent ) + 80   &quot;+-[ text ] \&quot;&quot; + temp + &quot;\&quot;&quot; ); 81   } 82   } 83 Overridden method called when end of element is encountered Overridden method called when processing instruction is encountered Overridden method called when character data is encountered
84   // method called when ignorable whitespace is found 85   public void  ignorableWhitespace(  char  buffer[], 86   int  offset,  int  length ) 87   { 88   if  ( length > 0 ) { 89   System.out.println( spacer( indent ) + &quot;+-[ ignorable ]&quot; ); 90   } 91   } 92 93   // method called on a non-fatal (validation) error 94   public void  error( SAXParseException spe )  95   throws  SAXParseException 96   { 97   // treat non-fatal errors as fatal errors 98   throw  spe; 99   } 100 101   // method called on a parsing warning 102   public void  warning( SAXParseException spe ) 103   throws  SAXParseException 104   { 105   System.err.println( &quot;Warning: &quot; + spe.getMessage() ); 106   } 107 108   // main method 109   public static void   main( String args[] ) 110   { 111   boolean  validate =  false ; 112 Overridden method called when ignorable whitespace is encountered Overridden method called when error (usually validation) occurs Overridden method called when problem is detected (but not considered error) Method  main  starts application
113   if  ( args.length != 2 ) { 114   System.err.println( &quot;Usage: java Tree [validate] &quot; + 115   &quot;[filename]\n&quot; ); 116   System.err.println( &quot;Options:&quot; ); 117   System.err.println( &quot;  validate [yes|no] : &quot; + 118   &quot;DTD validation&quot; ); 119   System.exit( 1 ); 120   } 121 122   if  ( args[ 0 ].equals( &quot;yes&quot; ) ) 123   validate =  true ; 124 125   SAXParserFactory saxFactory = 126   SAXParserFactory.newInstance(); 127 128   saxFactory.setValidating( validate ); 129 Allow command-line arguments (if we want to validate document) SAXParserFactory  can instantiate SAX-based parser
130   try  { 131   SAXParser saxParser = saxFactory.newSAXParser(); 132   saxParser.parse(  new  File( args[ 1 ] ),  new  Tree() ); 133   } 134   catch  ( SAXParseException spe ) { 135   System.err.println( &quot;Parse Error: &quot; + spe.getMessage() ); 136   } 137   catch  ( SAXException se ) { 138   se.printStackTrace(); 139   } 140   catch  ( ParserConfigurationException pce ) { 141   pce.printStackTrace(); 142   } 143   catch  ( IOException ioe ) { 144   ioe.printStackTrace(); 145   } 146 147   System.exit( 0 ); 148   } 149 } Instantiate SAX-based parser Handles errors (if any)
URL: file:C:/Tree/spacing1.xml [ document root ] +-[ element : test ]   +-[ attribute : name ] &quot;  spacing 1  &quot;   +-[ text ] &quot; &quot;   +-[ text ] &quot;  &quot;   +-[ element : example ]   +-[ element : object ]   +-[ text ] &quot;World&quot;   +-[ text ] &quot; &quot; [ document end ]   1 <?xml version =  &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.4 : spacing1.xml  --> 4 <!-- Whitespaces in nonvalidating parsing --> 5 <!-- XML document without DTD  --> 6 7 <test name =  &quot;  spacing 1  &quot; > 8   <example><object> World </object></example> 9 </test> Root element  test  contains attribute  name  with value  “ spacing 1 ” XML document with elements  test ,  example  and  object XML document does not reference DTD Note that whitespace is preserved: attribute value (line 7), line feed (end of line 7), indentation (line 8) and line feed (end of line 8)
URL: file:C:/Tree/spacing2.xml [ document root ] +-[ element : test ]   +-[ attribute : name ] &quot;  spacing 2  &quot;   +-[ ignorable ]   +-[ ignorable ]   +-[ element : example ]   +-[ element : object ]   +-[ text ] &quot;World&quot;   +-[ ignorable ] [ document end ]   1 <?xml version =  &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.5 : spacing2.xml  --> 4 <!-- Whitespace and nonvalidated parsing --> 5 <!-- XML document with DTD  --> 6 7 <!DOCTYPE  test  [ 8 <!ELEMENT  test (example) > 9 <!ATTLIST  test name  CDATA #IMPLIED> 10 <!ELEMENT  element (object*) > 11 <!ELEMENT  object   ( #PCDATA ) > 12 ]> 13 14 <test name =  &quot;  spacing 2  &quot; > 15   <example><object> World </object></example> 16 </test> DTD checks document’s characters, so any “removable” whitespace is ignorable Line feed at line 14, spaces at beginning of line 15 and line feed at line 15 are ignored
URL: file:C:/Tree/notvalid.xml [ document root ] +-[ element : test ]   +-[ ignorable ]   +-[ ignorable ]   +-[ proc-inst : test ] &quot;message&quot;   +-[ ignorable ]   +-[ ignorable ]   +-[ element : example ]   +-[ element : item ]   +-[ text ] &quot;Hello & Welcome!&quot;   +-[ ignorable ] [ document end ]   1 <?xml version =  &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.6 : notvalid.xml  --> 4 <!-- Validation and non-validation --> 5 6 <!DOCTYPE  test  [ 7 <!ELEMENT  test (example) > 8 <!ELEMENT  example ( #PCDATA ) > 9 ]> 10 11 <test> 12   <?test message?> 13   <example><item><![CDATA[ Hello & Welcome! ]]></item></example> 14 </test> Invalid document because element  example  cannot contain element  item Validation disabled, so document parses successfully Parser does not process text in  CDATA  section and returns character data
URL: file:C:/Tree/notvalid.xml [ document root ] +-[ element : test ]   +-[ ignorable ]   +-[ ignorable ]   +-[ proc-inst : test ] &quot;message&quot;   +-[ ignorable ]   +-[ ignorable ]   +-[ element : example ] Parse Error: Element &quot;example&quot; does not allow &quot;item&quot; Parsing terminates when fatal error occurs at element  item Validation enabled
URL: file:C:/Tree/valid.xml [ document root ] +-[ element : test ]   +-[ text ] &quot; &quot;   +-[ text ] &quot;  &quot;   +-[ element : example ]   +-[ text ] &quot;Hello &quot;   +-[ text ] &quot;&&quot;   +-[ text ] &quot; Welcome!&quot;   +-[ text ] &quot; &quot; [ document end ]   URL: file:C:/Tree/valid.xml [ document root ] Warning: Valid documents must have a <!DOCTYPE declaration. Parse Error: Element type &quot;test&quot; is not declared.   1 <?xml version =  &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.7 : valid.xml  --> 4 <!-- DTD-less document  --> 5 6 <test> 7   <example> Hello &amp; Welcome! </example> 8 </test> Validation disabled in first output, so document parses successfully Validation enabled in second output, and parsing fails because DTD does not exist
To be continued… To be continued…
Reference XML How to program Sang Sin Presentation (sang.sin@sun.com)
Q&A Feel free to post questions at  http://yht4ever.blogspot.com or email to:  [email_address]  or  [email_address]
http://yht4ever.blogspot.com Thank You !

Simple API for XML

  • 1.
    Simple API forXML (SAX) XML http://yht4ever.blogspot.com [email_address] B070066 - NIIT Quang Trung 08/2007
  • 2.
    Contents Events SAX-basedParsers DOM vs. SAX Introduction
  • 3.
    Introduction Simple APIfor XML Another method for accessing XML document’s contents Developed by XML-DEV mailing-list members Uses event-based model Notifications (events) are raised as document is parsed Originally designed as Java API &quot;Others (C++, Python, Perl) are now supported
  • 4.
    DOM vs. SAXDOM Tree-based model Stores document data in node hierarchy Data is accessed quickly Provides facilities for adding and removing nodes SAX Invoke methods when markup (specific tag) is encountered Greater performance than DOM Less memory overhead than DOM Typically used for reading documents (not modifying them) (see more on slide note)
  • 5.
    SAX-based Parsers Availablefor variety of programming languages e.g., Java, Python, etc. Some SAX-based parsers.
  • 6.
    SAX-based Parsers SAXparser Invokes certain methods when events occur Programmers override these methods to process data
  • 7.
    Example: Tree Diagram1 // Fig. 9.3 : Tree.java 2 // Using the SAX Parser to generate a tree diagram. 3 4 import java.io.*; 5 import org.xml.sax.*; // for HandlerBase class 6 import javax.xml.parsers.SAXParserFactory; 7 import javax.xml.parsers.ParserConfigurationException; 8 import javax.xml.parsers.SAXParser; 9 10 public class Tree extends HandlerBase { 11 private int indent = 0; // indentation counter 12 13 // returns the spaces needed for indenting 14 private String spacer( int count ) 15 { 16 String temp = &quot;&quot;; 17 18 for ( int i = 0; i < count; i++ ) 19 temp += &quot; &quot;; 20 21 return temp; 22 } 23 24 // method called before parsing 25 // it provides the document location 26 public void setDocumentLocator( Locator loc ) 27 { 28 System.out.println( &quot;URL: &quot; + loc.getSystemId() ); 29 } 30 1 // Fig. 9.3 : Tree.java 2 // Using the SAX Parser to generate a tree diagram. 3 4 import java.io.*; 5 import org.xml.sax.*; // for HandlerBase class 6 import javax.xml.parsers.SAXParserFactory; 7 import javax.xml.parsers.ParserConfigurationException; 8 import javax.xml.parsers.SAXParser; 9 10 public class Tree extends HandlerBase { 11 private int indent = 0; // indentation counter 12 13 // returns the spaces needed for indenting 14 private String spacer( int count ) 15 { 16 String temp = &quot;&quot;; 17 18 for ( int i = 0; i < count; i++ ) 19 temp += &quot; &quot;; 20 21 return temp; 22 } 23 24 // method called before parsing 25 // it provides the document location 26 public void setDocumentLocator( Locator loc ) 27 { 28 System.out.println( &quot;URL: &quot; + loc.getSystemId() ); 29 } 30 import specifies location of classes needed by application Assists in formatting Override method to output parsed document’s URL
  • 8.
    31 // method called at the beginning of a document 32 public void startDocument() throws SAXException 33 { 34 System.out.println( &quot;[ document root ]&quot; ); 35 } 36 37 // method called at the end of the document 38 public void endDocument() throws SAXException 39 { 40 System.out.println( &quot;[ document end ]&quot; ); 41 } 42 43 // method called at the start tag of an element 44 public void startElement( String name, 45 AttributeList attributes ) throws SAXException 46 { 47 System.out.println( spacer( indent++ ) + 48 &quot;+-[ element : &quot; + name + &quot; ]&quot;); 49 50 if ( attributes != null ) 51 52 for ( int i = 0; i < attributes.getLength(); i++ ) 53 System.out.println( spacer( indent ) + 54 &quot;+-[ attribute : &quot; + attributes.getName( i ) + 55 &quot; ] \&quot;&quot; + attributes.getValue( i ) + &quot;\&quot;&quot; ); 56 } 57 Overridden method called when root node encountered Overridden method called when end of document is encountered Overridden method called when start tag is encountered Output each attribute’s name and value (if any)
  • 9.
    58 // method called at the end tag of an element 59 public void endElement( String name ) throws SAXException 60 { 61 indent--; 62 } 63 64 // method called when a processing instruction is found 65 public void processingInstruction( String target, 66 String value ) throws SAXException 67 { 68 System.out.println( spacer( indent ) + 69 &quot;+-[ proc-inst : &quot; + target + &quot; ] \&quot;&quot; + value + &quot;\&quot;&quot; ); 70 } 71 72 // method called when characters are found 73 public void characters( char buffer[], int offset, 74 int length ) throws SAXException 75 { 76 if ( length > 0 ) { 77 String temp = new String( buffer, offset, length ); 78 79 System.out.println( spacer( indent ) + 80 &quot;+-[ text ] \&quot;&quot; + temp + &quot;\&quot;&quot; ); 81 } 82 } 83 Overridden method called when end of element is encountered Overridden method called when processing instruction is encountered Overridden method called when character data is encountered
  • 10.
    84 // method called when ignorable whitespace is found 85 public void ignorableWhitespace( char buffer[], 86 int offset, int length ) 87 { 88 if ( length > 0 ) { 89 System.out.println( spacer( indent ) + &quot;+-[ ignorable ]&quot; ); 90 } 91 } 92 93 // method called on a non-fatal (validation) error 94 public void error( SAXParseException spe ) 95 throws SAXParseException 96 { 97 // treat non-fatal errors as fatal errors 98 throw spe; 99 } 100 101 // method called on a parsing warning 102 public void warning( SAXParseException spe ) 103 throws SAXParseException 104 { 105 System.err.println( &quot;Warning: &quot; + spe.getMessage() ); 106 } 107 108 // main method 109 public static void main( String args[] ) 110 { 111 boolean validate = false ; 112 Overridden method called when ignorable whitespace is encountered Overridden method called when error (usually validation) occurs Overridden method called when problem is detected (but not considered error) Method main starts application
  • 11.
    113 if ( args.length != 2 ) { 114 System.err.println( &quot;Usage: java Tree [validate] &quot; + 115 &quot;[filename]\n&quot; ); 116 System.err.println( &quot;Options:&quot; ); 117 System.err.println( &quot; validate [yes|no] : &quot; + 118 &quot;DTD validation&quot; ); 119 System.exit( 1 ); 120 } 121 122 if ( args[ 0 ].equals( &quot;yes&quot; ) ) 123 validate = true ; 124 125 SAXParserFactory saxFactory = 126 SAXParserFactory.newInstance(); 127 128 saxFactory.setValidating( validate ); 129 Allow command-line arguments (if we want to validate document) SAXParserFactory can instantiate SAX-based parser
  • 12.
    130 try { 131 SAXParser saxParser = saxFactory.newSAXParser(); 132 saxParser.parse( new File( args[ 1 ] ), new Tree() ); 133 } 134 catch ( SAXParseException spe ) { 135 System.err.println( &quot;Parse Error: &quot; + spe.getMessage() ); 136 } 137 catch ( SAXException se ) { 138 se.printStackTrace(); 139 } 140 catch ( ParserConfigurationException pce ) { 141 pce.printStackTrace(); 142 } 143 catch ( IOException ioe ) { 144 ioe.printStackTrace(); 145 } 146 147 System.exit( 0 ); 148 } 149 } Instantiate SAX-based parser Handles errors (if any)
  • 13.
    URL: file:C:/Tree/spacing1.xml [document root ] +-[ element : test ] +-[ attribute : name ] &quot; spacing 1 &quot; +-[ text ] &quot; &quot; +-[ text ] &quot; &quot; +-[ element : example ] +-[ element : object ] +-[ text ] &quot;World&quot; +-[ text ] &quot; &quot; [ document end ] 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.4 : spacing1.xml --> 4 <!-- Whitespaces in nonvalidating parsing --> 5 <!-- XML document without DTD --> 6 7 <test name = &quot; spacing 1 &quot; > 8 <example><object> World </object></example> 9 </test> Root element test contains attribute name with value “ spacing 1 ” XML document with elements test , example and object XML document does not reference DTD Note that whitespace is preserved: attribute value (line 7), line feed (end of line 7), indentation (line 8) and line feed (end of line 8)
  • 14.
    URL: file:C:/Tree/spacing2.xml [document root ] +-[ element : test ] +-[ attribute : name ] &quot; spacing 2 &quot; +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : object ] +-[ text ] &quot;World&quot; +-[ ignorable ] [ document end ] 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.5 : spacing2.xml --> 4 <!-- Whitespace and nonvalidated parsing --> 5 <!-- XML document with DTD --> 6 7 <!DOCTYPE test [ 8 <!ELEMENT test (example) > 9 <!ATTLIST test name CDATA #IMPLIED> 10 <!ELEMENT element (object*) > 11 <!ELEMENT object ( #PCDATA ) > 12 ]> 13 14 <test name = &quot; spacing 2 &quot; > 15 <example><object> World </object></example> 16 </test> DTD checks document’s characters, so any “removable” whitespace is ignorable Line feed at line 14, spaces at beginning of line 15 and line feed at line 15 are ignored
  • 15.
    URL: file:C:/Tree/notvalid.xml [document root ] +-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] &quot;message&quot; +-[ ignorable ] +-[ ignorable ] +-[ element : example ] +-[ element : item ] +-[ text ] &quot;Hello & Welcome!&quot; +-[ ignorable ] [ document end ] 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.6 : notvalid.xml --> 4 <!-- Validation and non-validation --> 5 6 <!DOCTYPE test [ 7 <!ELEMENT test (example) > 8 <!ELEMENT example ( #PCDATA ) > 9 ]> 10 11 <test> 12 <?test message?> 13 <example><item><![CDATA[ Hello & Welcome! ]]></item></example> 14 </test> Invalid document because element example cannot contain element item Validation disabled, so document parses successfully Parser does not process text in CDATA section and returns character data
  • 16.
    URL: file:C:/Tree/notvalid.xml [document root ] +-[ element : test ] +-[ ignorable ] +-[ ignorable ] +-[ proc-inst : test ] &quot;message&quot; +-[ ignorable ] +-[ ignorable ] +-[ element : example ] Parse Error: Element &quot;example&quot; does not allow &quot;item&quot; Parsing terminates when fatal error occurs at element item Validation enabled
  • 17.
    URL: file:C:/Tree/valid.xml [document root ] +-[ element : test ] +-[ text ] &quot; &quot; +-[ text ] &quot; &quot; +-[ element : example ] +-[ text ] &quot;Hello &quot; +-[ text ] &quot;&&quot; +-[ text ] &quot; Welcome!&quot; +-[ text ] &quot; &quot; [ document end ] URL: file:C:/Tree/valid.xml [ document root ] Warning: Valid documents must have a <!DOCTYPE declaration. Parse Error: Element type &quot;test&quot; is not declared. 1 <?xml version = &quot;1.0&quot; ?> 2 3 <!-- Fig. 9.7 : valid.xml --> 4 <!-- DTD-less document --> 5 6 <test> 7 <example> Hello &amp; Welcome! </example> 8 </test> Validation disabled in first output, so document parses successfully Validation enabled in second output, and parsing fails because DTD does not exist
  • 18.
    To be continued…To be continued…
  • 19.
    Reference XML Howto program Sang Sin Presentation (sang.sin@sun.com)
  • 20.
    Q&A Feel freeto post questions at http://yht4ever.blogspot.com or email to: [email_address] or [email_address]
  • 21.