SlideShare a Scribd company logo
SAX
SAX and DOM
• SAX and DOM are standards for XML parsers--program
APIs to read and interpret XML files
– DOM is a W3C standard
– SAX is an ad-hoc (but very popular) standard
– SAX was developed by David Megginson and is open source

• There are various implementations available
• Java implementations are provided as part of JAXP (Java
API for XML Processing)
• JAXP is included as a package in Java 1.4
– JAXP is available separately for Java 1.3

• Unlike many XML technologies, SAX and DOM are
relatively easy
Difference between SAX and DOM
• DOM reads the entire XML document into memory and
stores it as a tree data structure
• SAX reads the XML document and calls one of your methods
for each element or block of text that it encounters
• Consequences:
– DOM provides “random access” into the XML document
– SAX provides only sequential access to the XML document
– DOM is slow and requires huge amounts of memory, so it cannot be
used for large XML documents
– SAX is fast and requires very little memory, so it can be used for
huge documents (or large numbers of documents)
• This makes SAX much more popular for web sites

– Some DOM implementations have methods for changing the XML
document in memory; SAX implementations do not
Callbacks
• SAX works through callbacks: you call the parser,
it calls methods that you supply
Your program
startDocument(...)
main(...)

The SAX parser
parse(...)

startElement(...)
characters(...)
endElement( )
endDocument( )
Simple SAX program
• The following program is adapted from CodeNotes® for
XML by Gregory Brill, pages 158-159
• The program consists of two classes:
– Sample -- This class contains the main method; it
•
•
•
•
•

Gets a factory to make parsers
Gets a parser from the factory
Creates a Handler object to handle callbacks from the parser
Tells the parser which handler to send its callbacks to
Reads and parses the input XML file

– Handler -- This class contains handlers for three kinds of callbacks:
• startElement callbacks, generated when a start tag is seen
• endElement callbacks, generated when an end tag is seen
• characters callbacks, generated for the contents of an element
The Sample class, I
import javax.xml.parsers.*; // for both SAX and DOM
import org.xml.sax.*;
import org.xml.sax.helpers.*;
// For simplicity, we let the operating system handle exceptions
// In "real life" this is poor programming practice
public class Sample {
public static void main(String args[]) throws Exception {
// Create a parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Tell factory that the parser must understand namespaces
factory.setNamespaceAware(true);
// Make the parser
SAXParser saxParser = factory.newSAXParser();
XMLReader parser = saxParser.getXMLReader();
The Sample class, II
• In the previous slide we made a parser, of type XMLReader
• Now we continue with:
// Create a handler
Handler handler = new Handler();
// Tell the parser to use this handler
parser.setContentHandler(handler);
// Finally, read and parse the document
parser.parse("hello.xml");
} // end of Sample class

• You will need to put the file hello.xml :
– In the same directory, if you run the program from the command line
– Or where it can be found by the particular IDE you are using
The Handler class, I
• public class Handler extends DefaultHandler {
– DefaultHandler is an adapter class that defines these methods and
others as do-nothing methods, to be overridden as desired
– We will define three very similar methods to handle (1) start tags,
(2) contents, and (3) end tags--our methods will just print a line
– Each of these three methods could throw a SAXException
•

// SAX calls this method when it encounters a start tag
public void startElement(String namespaceURI,
String localName,
String qualifiedName,
Attributes attributes)
throws SAXException {
System.out.println("startElement: " + qualifiedName);
}
The Handler class, II
•

// SAX calls this method to pass in character data
public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println("characters: " +
new String(ch, start, length));
}

•

// SAX call this method when it encounters an end tag
public void endElement(String namespaceURI,
String localName,
String qualifiedName)
throws SAXException {
System.out.println("Element: /" + qualifiedName);
}
} // End of Handler class
Results
• If the file hello.xml contains:
<?xml version="1.0"?>
<display>Hello World!</display>
• Then the output from running java Sample will be:
startElement: display
characters: Hello World!
Element: /display
Parser factories
• A factory is an alternative to constructors
• To create a SAX parser factory, call this method:
SAXParserFactory.newInstance()
– This returns an object of type SAXParserFactory
– It may throw a FactoryConfigurationError

• You can then say what kind of parser you want:
– public void setNamespaceAware(boolean awareness)
• Call this with true if you are using namespaces
• The default (if you don’t call this method) is false

– public void setValidating(boolean validating)
• Call this with true if you want to validate against a DTD
• The default (if you don’t call this method) is false
• Validation will give an error if you don’t have a DTD
Getting a parser
• Once you have a SAXParserFactory set up (say it’s named
factory), you can create a parser with:
SAXParser saxParser = factory.newSAXParser();
XMLReader parser = saxParser.getXMLReader();
• Note: older texts may use Parser in place of XMLReader
– Parser is SAX1, not SAX2, and is now deprecated
– SAX2 supports namespaces and some new parser properties

• Note: SAXParser is not thread-safe; to use it in multiple
threads, create a separate SAXParser for each thread
– This is unlikely to be a problem in class projects
Declaring which handler to use
• Since the SAX parser will be calling our methods, we need
to supply these methods
• In the example these are in a separate class, Handler
• We need to tell the parser where to find the methods:
Handler handler = new Handler();
parser.setContentHandler(handler);

• These statements could be combined:
parser.setContentHandler(new Handler());

• Finally, we call the parser and tell it what file to parse:
parser.parse("hello.xml");

• Everything else will be done in the handler methods
SAX handlers
• A callback handler for SAX must implement these four
interfaces:
– interface ContentHandler
• This is the most important interface--it handles basic parsing
callbacks, such as element starts and ends

– interface DTDHandler
• Handles only notation and unparsed entity declarations

– interface EntityResolver
• Does customized handling for external entities

– interface ErrorHandler
• Must be implemented or parsing errors will be ignored!

• You could implement all these interfaces yourself, but
that’s a lot of work--it’s easier to use an adapter class
Class DefaultHandler
• DefaultHandler is in package org.xml.sax.helpers
• DefaultHandler implements ContentHandler, DTDHandler,
EntityResolver, and ErrorHandler
• DefaultHandler is an adapter class--it provides empty
methods for every method declared in each of the four
interfaces
– Empty methods don’t do anything

• To use this class, extend it and override the methods that are
important to your application
– We will cover some of the methods in the ContentHandler and
ErrorHandler interfaces
ContentHandler methods, I
• public void setDocumentLocator(Locator loc)
– This method is called once, when parsing first starts
– The Locator contains either a URL or a URN, or both, that
specifies where the document is located
– You may need this information if you need to find a document
whose position is specified relative to this XML document
– Locator methods include:
• public String getPublicId() returns the public identifier for the
current document
• public String getSystemId() returns the system identifier for the
current document

– Every ContentHandler method except this one may throw a
SAXException
ContentHandler methods, II
• public void processingInstruction(String target,
String data)
throws SAXException
• This method is called once for each processing instruction
(PI) that is encountered
• The PI is presented as two strings: <?target data?>
• According to XML rules, PIs may occur anywhere in the
document after the initial <?xml ...?> line
– This means calls to processingInstruction do not
necessarily occur before startElement is called with
the document root
ContentHandler methods, III
• public void startDocument()
throws SAXException
– This is called just once, at the beginning of parsing

• public void endDocument()
throws SAXException
– This is called just once, and is the last method called by the parser
• Remember: when you override a method, you can throw fewer kinds of
exceptions, but you can’t throw any new kinds
– In other words: your methods don’t have to throw a SAXException
– But if they do throw an exception, it can only be a SAXException
ContentHandler methods, IV
• public void startElement(String namespaceURI,
String localName,
String qualifiedName,
Attributes atts)
throws SAXException
• This method is called at the beginning of every element
• If the parser is namespace-aware,
– namespaceURI will hold the prefix (before the colon)
– localName will hold the element name (without a prefix)
– qualifiedName will be the empty string

• If the parser is not using namespaces,
– namespaceURI and localName will be empty strings
– qualifiedName will hold the element name (possibly with prefix)
Attributes, I
• When SAX calls startElement, it passes in a parameter of
type Attributes
• Attributes is an interface that defines a number of useful
methods; here are a few of them:
–
–
–
–
–

getLength() returns the number of attributes
getLocalName(index) returns the attribute’s local name
getQName(index) returns the attribute’s qualified name
getValue(index) returns the attribute’s value
getType(index) returns the attribute’s type, which will be one of
the Strings "CDATA", "ID", "IDREF", "IDREFS", "NMTOKEN",
"NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION”

• As with elements, if the local name is the empty string, then
the attribute’s name is in the qualified name
Attributes, II
• SAX does not guarantee that the attributes will be returned
in the same order they are written
– After all, the order is irrelevant in XML

• The following methods look up attributes by name rather
than by index:
– public String getValue(String qualifiedName)
– public String getValue(String uri, String localName)

• However, some methods of Attributes require an index, so:
– public int getIndex(String qualifiedName)
– public int getIndex(String uri, String localName)

• An Attributes object is valid only during the call to
startElement
– If you need to remember attributes longer, use:
AttributesImpl attrImpl = new AttributesImpl(attributes);
ContentHandler methods, V
• endElement(String namespaceURI,
String localName,
String qualifiedName)
throws SAXException
• The parameters to endElement are the same as those to
startElement, except that the Attributes parameter is
omitted
ContentHandler methods, VI
• public void characters(char[] ch,
int start,
int length) throws SAXException

• ch is an array of characters
– Only length characters, starting from ch[start], are the
contents of the element
• The String constructor new String(ch, start, length) is an
easy way to extract the relevant characters from the char array
• characters may be called multiple times for one element
– Newlines and entities may break the data characters into separate calls
– characters may be called with length = 0
– All data characters of the element will eventually be given to characters
Example
• If hello.xml contains:
– <?xml version="1.0"?>
<display>
Hello World!
</display>

• Then the sample program we started with gives:
– startElement: display
characters:
characters:
characters:
characters:

<-- zero length string
<-- LF character (ASCII 10)

Hello World! <-- spaces are preserved
<-- LF character (ASCII 10)

Element: /display
Whitespace
• Whitespace is a major nuisance
– Whitespace is characters; characters are PCDATA
– IF you are validating, the parser will ignore whitespace
where PCDATA is not allowed by the DTD
– If you are not validating, the parser cannot ignore whitespace
– If you ignore whitespace, you lose your indentation

• To ignore whitespace when validating:
– Happens automatically

• To ignore whitespace when not validating:
– Use the String function trim() to remove whitespace
– Check the result to see if it is the empty string
Handling ignorable whitespace
• A nonvalidating parser cannot ignore whitespace, because it
cannot distinguish it from real data
• A validating parser can, and does, ignore whitespace where
character data is not allowed
– For processing XML, this is usually what you want
– However, if you are manipulating and writing out XML, discarding
whitespace ruins your indentation
– To capture ignorable whitespace, you can override this method
(defined in DefaultHandler):
public void ignorableWhitespace(char[] ch,
int start,
int length)
throws SAXException
• Parameters are the same as those for characters
Error Handling, I
• SAX error handling is unusual
• Errors are ignored unless you register an error
handler (org.xml.sax.ErrorHandler)
– Ignored errors can cause bizarre behavior
– Failing to provide an error handler is unwise

• The ErrorHandler interface declares:
– public void fatalError (SAXParseException exception)
throws SAXException // XML not well structured
– public void error (SAXParseException exception)
throws SAXException // XML validation error
– public void warning (SAXParseException exception)
throws SAXException // minor problem
Error Handling, II
• If you are extending DefaultHandler, it implements
ErrorHandler and registers itself
– DefaultHandler’s version of fatalError() throws a
SAXException, but...
– its error() and warning() methods do nothing!

• You can (and should) override these methods
• Note that the only kind of exception your override
methods can throw is a SAXException
– When you override a method, you cannot add exception types
– If you need to throw another kind of exception, say an
IOException, you can encapsulate it in a SAXException:
• catch (IOException ioException) {
throw new SAXException("I/O error: ", ioException)
}
Error Handling, III
• If you are not extending DefaultHandler:
– Create a new class (say, MyErrorHandler) that
implements ErrorHandler (by supplying the three
methods fatalError, error, and warning)
– Create a new object of this class
– Tell your XMLReader object about it by sending it the
following message:
setErrorHandler(ErrorHandler handler)

• Example:
XMLReader parser = saxParser.getXMLReader();
parser.setErrorHandler(new MyErrorHandler());
The End

More Related Content

What's hot

SAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginnersSAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginners
Hicham QAISSI
 
XML
XMLXML
Tool Development 05 - XML Schema, INI, JSON, YAML
Tool Development 05 - XML Schema, INI, JSON, YAMLTool Development 05 - XML Schema, INI, JSON, YAML
Tool Development 05 - XML Schema, INI, JSON, YAML
Nick Pruehs
 
Tool Development 04 - XML
Tool Development 04 - XMLTool Development 04 - XML
Tool Development 04 - XML
Nick Pruehs
 
JAXB
JAXBJAXB
Java Programming - 06 java file io
Java Programming - 06 java file ioJava Programming - 06 java file io
Java Programming - 06 java file io
Danairat Thanabodithammachari
 
Introductionto xslt
Introductionto xsltIntroductionto xslt
Introductionto xslt
Kumar
 
Streams in Java 8
Streams in Java 8Streams in Java 8
Streams in Java 8
Tobias Coetzee
 
Helberg acl-final
Helberg acl-finalHelberg acl-final
Helberg acl-final
Clay Helberg
 
Chelberg ptcuser 2010
Chelberg ptcuser 2010Chelberg ptcuser 2010
Chelberg ptcuser 2010
Clay Helberg
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
06 xml processing-in-.net
06 xml processing-in-.net06 xml processing-in-.net
06 xml processing-in-.net
glubox
 
Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)
Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)
Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)
Rathod Shukar
 
XML parsing using jaxb
XML parsing using jaxbXML parsing using jaxb
XML parsing using jaxb
Malintha Adikari
 
Xml writers
Xml writersXml writers
Xml writers
Raghu nath
 
Io streams
Io streamsIo streams
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
Paul Borgermans
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
Martin Odersky
 
Namespace in C++ Programming Language
Namespace in C++ Programming LanguageNamespace in C++ Programming Language
Namespace in C++ Programming Language
Himanshu Choudhary
 
Understanding C# in .NET
Understanding C# in .NETUnderstanding C# in .NET
Understanding C# in .NET
mentorrbuddy
 

What's hot (20)

SAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginnersSAX, DOM & JDOM parsers for beginners
SAX, DOM & JDOM parsers for beginners
 
XML
XMLXML
XML
 
Tool Development 05 - XML Schema, INI, JSON, YAML
Tool Development 05 - XML Schema, INI, JSON, YAMLTool Development 05 - XML Schema, INI, JSON, YAML
Tool Development 05 - XML Schema, INI, JSON, YAML
 
Tool Development 04 - XML
Tool Development 04 - XMLTool Development 04 - XML
Tool Development 04 - XML
 
JAXB
JAXBJAXB
JAXB
 
Java Programming - 06 java file io
Java Programming - 06 java file ioJava Programming - 06 java file io
Java Programming - 06 java file io
 
Introductionto xslt
Introductionto xsltIntroductionto xslt
Introductionto xslt
 
Streams in Java 8
Streams in Java 8Streams in Java 8
Streams in Java 8
 
Helberg acl-final
Helberg acl-finalHelberg acl-final
Helberg acl-final
 
Chelberg ptcuser 2010
Chelberg ptcuser 2010Chelberg ptcuser 2010
Chelberg ptcuser 2010
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
06 xml processing-in-.net
06 xml processing-in-.net06 xml processing-in-.net
06 xml processing-in-.net
 
Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)
Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)
Wordpress (class,property,visibility,const,destr,inheritence,mysql etc)
 
XML parsing using jaxb
XML parsing using jaxbXML parsing using jaxb
XML parsing using jaxb
 
Xml writers
Xml writersXml writers
Xml writers
 
Io streams
Io streamsIo streams
Io streams
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
Namespace in C++ Programming Language
Namespace in C++ Programming LanguageNamespace in C++ Programming Language
Namespace in C++ Programming Language
 
Understanding C# in .NET
Understanding C# in .NETUnderstanding C# in .NET
Understanding C# in .NET
 

Viewers also liked

XML Document Object Model (DOM)
XML Document Object Model (DOM)XML Document Object Model (DOM)
XML Document Object Model (DOM)
BOSS Webtech
 
DOM and SAX
DOM and SAXDOM and SAX
DOM and SAX
Jussi Pohjolainen
 
Xml parsers
Xml parsersXml parsers
Xml parsers
Manav Prasad
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
vikram singh
 
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
Wojciech Podgórski
 
Xml processors
Xml processorsXml processors
Xml processors
Saurav Mawandia
 
Xml3
Xml3Xml3
Session 5
Session 5Session 5
Semantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parserSemantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parser
Serdar Sönmez
 
Dino's DEV Project
Dino's DEV ProjectDino's DEV Project
Dino's DEV Project
dinoppc40sw07
 
Introduce to XML
Introduce to XMLIntroduce to XML
Introduce to XML
videde_group
 
XSLT
XSLTXSLT
XML DOM
XML DOMXML DOM
XML DOM
Hoang Nguyen
 
java API for XML DOM
java API for XML DOMjava API for XML DOM
java API for XML DOM
Surinder Kaur
 
DOM ( Document Object Model )
DOM ( Document Object Model )DOM ( Document Object Model )
DOM ( Document Object Model )
ITSTB
 
An Introduction to the DOM
An Introduction to the DOMAn Introduction to the DOM
An Introduction to the DOM
Mindy McAdams
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
chomas kandar
 
XML/XSLT
XML/XSLTXML/XSLT
XML/XSLT
thinkahead.net
 
Dom parser
Dom parserDom parser
Dom parser
sana mateen
 
Difference Between DOM and SAX parser in java with examples
Difference Between DOM and SAX parser in java with examplesDifference Between DOM and SAX parser in java with examples
Difference Between DOM and SAX parser in java with examples
Said Benaissa
 

Viewers also liked (20)

XML Document Object Model (DOM)
XML Document Object Model (DOM)XML Document Object Model (DOM)
XML Document Object Model (DOM)
 
DOM and SAX
DOM and SAXDOM and SAX
DOM and SAX
 
Xml parsers
Xml parsersXml parsers
Xml parsers
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
 
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
 
Xml processors
Xml processorsXml processors
Xml processors
 
Xml3
Xml3Xml3
Xml3
 
Session 5
Session 5Session 5
Session 5
 
Semantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parserSemantic web xml-rdf-dom parser
Semantic web xml-rdf-dom parser
 
Dino's DEV Project
Dino's DEV ProjectDino's DEV Project
Dino's DEV Project
 
Introduce to XML
Introduce to XMLIntroduce to XML
Introduce to XML
 
XSLT
XSLTXSLT
XSLT
 
XML DOM
XML DOMXML DOM
XML DOM
 
java API for XML DOM
java API for XML DOMjava API for XML DOM
java API for XML DOM
 
DOM ( Document Object Model )
DOM ( Document Object Model )DOM ( Document Object Model )
DOM ( Document Object Model )
 
An Introduction to the DOM
An Introduction to the DOMAn Introduction to the DOM
An Introduction to the DOM
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 
XML/XSLT
XML/XSLTXML/XSLT
XML/XSLT
 
Dom parser
Dom parserDom parser
Dom parser
 
Difference Between DOM and SAX parser in java with examples
Difference Between DOM and SAX parser in java with examplesDifference Between DOM and SAX parser in java with examples
Difference Between DOM and SAX parser in java with examples
 

Similar to XML SAX PARSING

24sax
24sax24sax
24sax
Adil Jafri
 
Java
JavaJava
Learning from "Effective Scala"
Learning from "Effective Scala"Learning from "Effective Scala"
Learning from "Effective Scala"
Kazuhiro Sera
 
Javasession6
Javasession6Javasession6
Javasession6
Rajeev Kumar
 
c++ Unit III - PPT.pptx
c++ Unit III - PPT.pptxc++ Unit III - PPT.pptx
Java 8 lambdas expressions
Java 8 lambdas expressionsJava 8 lambdas expressions
Java 8 lambdas expressions
Lars Lemos
 
PROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part IIPROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part II
SivaSankari36
 
advanced java ppt
advanced java pptadvanced java ppt
advanced java ppt
PreetiDixit22
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
Sasha Kravchuk
 
Java8
Java8Java8
Java and the JVM
Java and the JVMJava and the JVM
Java and the JVM
Manish Pandit
 
Unit3 packages &amp; interfaces
Unit3 packages &amp; interfacesUnit3 packages &amp; interfaces
Unit3 packages &amp; interfaces
Kalai Selvi
 
7 Methods and Functional Programming
7  Methods and Functional Programming7  Methods and Functional Programming
7 Methods and Functional Programming
Deepak Hagadur Bheemaraju
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
Alexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Lucidworks
 
Unit3 part3-packages and interfaces
Unit3 part3-packages and interfacesUnit3 part3-packages and interfaces
Unit3 part3-packages and interfaces
DevaKumari Vijay
 
Threads in java, Multitasking and Multithreading
Threads in java, Multitasking and MultithreadingThreads in java, Multitasking and Multithreading
Threads in java, Multitasking and Multithreading
ssusere538f7
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
Patrick Walton
 
Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8
Simon Ritter
 
Beginning Java for .NET developers
Beginning Java for .NET developersBeginning Java for .NET developers
Beginning Java for .NET developers
Andrei Rinea
 

Similar to XML SAX PARSING (20)

24sax
24sax24sax
24sax
 
Java
JavaJava
Java
 
Learning from "Effective Scala"
Learning from "Effective Scala"Learning from "Effective Scala"
Learning from "Effective Scala"
 
Javasession6
Javasession6Javasession6
Javasession6
 
c++ Unit III - PPT.pptx
c++ Unit III - PPT.pptxc++ Unit III - PPT.pptx
c++ Unit III - PPT.pptx
 
Java 8 lambdas expressions
Java 8 lambdas expressionsJava 8 lambdas expressions
Java 8 lambdas expressions
 
PROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part IIPROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part II
 
advanced java ppt
advanced java pptadvanced java ppt
advanced java ppt
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Java8
Java8Java8
Java8
 
Java and the JVM
Java and the JVMJava and the JVM
Java and the JVM
 
Unit3 packages &amp; interfaces
Unit3 packages &amp; interfacesUnit3 packages &amp; interfaces
Unit3 packages &amp; interfaces
 
7 Methods and Functional Programming
7  Methods and Functional Programming7  Methods and Functional Programming
7 Methods and Functional Programming
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Unit3 part3-packages and interfaces
Unit3 part3-packages and interfacesUnit3 part3-packages and interfaces
Unit3 part3-packages and interfaces
 
Threads in java, Multitasking and Multithreading
Threads in java, Multitasking and MultithreadingThreads in java, Multitasking and Multithreading
Threads in java, Multitasking and Multithreading
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
 
Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8
 
Beginning Java for .NET developers
Beginning Java for .NET developersBeginning Java for .NET developers
Beginning Java for .NET developers
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 

XML SAX PARSING

  • 1. SAX
  • 2. SAX and DOM • SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files – DOM is a W3C standard – SAX is an ad-hoc (but very popular) standard – SAX was developed by David Megginson and is open source • There are various implementations available • Java implementations are provided as part of JAXP (Java API for XML Processing) • JAXP is included as a package in Java 1.4 – JAXP is available separately for Java 1.3 • Unlike many XML technologies, SAX and DOM are relatively easy
  • 3. Difference between SAX and DOM • DOM reads the entire XML document into memory and stores it as a tree data structure • SAX reads the XML document and calls one of your methods for each element or block of text that it encounters • Consequences: – DOM provides “random access” into the XML document – SAX provides only sequential access to the XML document – DOM is slow and requires huge amounts of memory, so it cannot be used for large XML documents – SAX is fast and requires very little memory, so it can be used for huge documents (or large numbers of documents) • This makes SAX much more popular for web sites – Some DOM implementations have methods for changing the XML document in memory; SAX implementations do not
  • 4. Callbacks • SAX works through callbacks: you call the parser, it calls methods that you supply Your program startDocument(...) main(...) The SAX parser parse(...) startElement(...) characters(...) endElement( ) endDocument( )
  • 5. Simple SAX program • The following program is adapted from CodeNotes® for XML by Gregory Brill, pages 158-159 • The program consists of two classes: – Sample -- This class contains the main method; it • • • • • Gets a factory to make parsers Gets a parser from the factory Creates a Handler object to handle callbacks from the parser Tells the parser which handler to send its callbacks to Reads and parses the input XML file – Handler -- This class contains handlers for three kinds of callbacks: • startElement callbacks, generated when a start tag is seen • endElement callbacks, generated when an end tag is seen • characters callbacks, generated for the contents of an element
  • 6. The Sample class, I import javax.xml.parsers.*; // for both SAX and DOM import org.xml.sax.*; import org.xml.sax.helpers.*; // For simplicity, we let the operating system handle exceptions // In "real life" this is poor programming practice public class Sample { public static void main(String args[]) throws Exception { // Create a parser factory SAXParserFactory factory = SAXParserFactory.newInstance(); // Tell factory that the parser must understand namespaces factory.setNamespaceAware(true); // Make the parser SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader();
  • 7. The Sample class, II • In the previous slide we made a parser, of type XMLReader • Now we continue with: // Create a handler Handler handler = new Handler(); // Tell the parser to use this handler parser.setContentHandler(handler); // Finally, read and parse the document parser.parse("hello.xml"); } // end of Sample class • You will need to put the file hello.xml : – In the same directory, if you run the program from the command line – Or where it can be found by the particular IDE you are using
  • 8. The Handler class, I • public class Handler extends DefaultHandler { – DefaultHandler is an adapter class that defines these methods and others as do-nothing methods, to be overridden as desired – We will define three very similar methods to handle (1) start tags, (2) contents, and (3) end tags--our methods will just print a line – Each of these three methods could throw a SAXException • // SAX calls this method when it encounters a start tag public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes attributes) throws SAXException { System.out.println("startElement: " + qualifiedName); }
  • 9. The Handler class, II • // SAX calls this method to pass in character data public void characters(char ch[], int start, int length) throws SAXException { System.out.println("characters: " + new String(ch, start, length)); } • // SAX call this method when it encounters an end tag public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException { System.out.println("Element: /" + qualifiedName); } } // End of Handler class
  • 10. Results • If the file hello.xml contains: <?xml version="1.0"?> <display>Hello World!</display> • Then the output from running java Sample will be: startElement: display characters: Hello World! Element: /display
  • 11. Parser factories • A factory is an alternative to constructors • To create a SAX parser factory, call this method: SAXParserFactory.newInstance() – This returns an object of type SAXParserFactory – It may throw a FactoryConfigurationError • You can then say what kind of parser you want: – public void setNamespaceAware(boolean awareness) • Call this with true if you are using namespaces • The default (if you don’t call this method) is false – public void setValidating(boolean validating) • Call this with true if you want to validate against a DTD • The default (if you don’t call this method) is false • Validation will give an error if you don’t have a DTD
  • 12. Getting a parser • Once you have a SAXParserFactory set up (say it’s named factory), you can create a parser with: SAXParser saxParser = factory.newSAXParser(); XMLReader parser = saxParser.getXMLReader(); • Note: older texts may use Parser in place of XMLReader – Parser is SAX1, not SAX2, and is now deprecated – SAX2 supports namespaces and some new parser properties • Note: SAXParser is not thread-safe; to use it in multiple threads, create a separate SAXParser for each thread – This is unlikely to be a problem in class projects
  • 13. Declaring which handler to use • Since the SAX parser will be calling our methods, we need to supply these methods • In the example these are in a separate class, Handler • We need to tell the parser where to find the methods: Handler handler = new Handler(); parser.setContentHandler(handler); • These statements could be combined: parser.setContentHandler(new Handler()); • Finally, we call the parser and tell it what file to parse: parser.parse("hello.xml"); • Everything else will be done in the handler methods
  • 14. SAX handlers • A callback handler for SAX must implement these four interfaces: – interface ContentHandler • This is the most important interface--it handles basic parsing callbacks, such as element starts and ends – interface DTDHandler • Handles only notation and unparsed entity declarations – interface EntityResolver • Does customized handling for external entities – interface ErrorHandler • Must be implemented or parsing errors will be ignored! • You could implement all these interfaces yourself, but that’s a lot of work--it’s easier to use an adapter class
  • 15. Class DefaultHandler • DefaultHandler is in package org.xml.sax.helpers • DefaultHandler implements ContentHandler, DTDHandler, EntityResolver, and ErrorHandler • DefaultHandler is an adapter class--it provides empty methods for every method declared in each of the four interfaces – Empty methods don’t do anything • To use this class, extend it and override the methods that are important to your application – We will cover some of the methods in the ContentHandler and ErrorHandler interfaces
  • 16. ContentHandler methods, I • public void setDocumentLocator(Locator loc) – This method is called once, when parsing first starts – The Locator contains either a URL or a URN, or both, that specifies where the document is located – You may need this information if you need to find a document whose position is specified relative to this XML document – Locator methods include: • public String getPublicId() returns the public identifier for the current document • public String getSystemId() returns the system identifier for the current document – Every ContentHandler method except this one may throw a SAXException
  • 17. ContentHandler methods, II • public void processingInstruction(String target, String data) throws SAXException • This method is called once for each processing instruction (PI) that is encountered • The PI is presented as two strings: <?target data?> • According to XML rules, PIs may occur anywhere in the document after the initial <?xml ...?> line – This means calls to processingInstruction do not necessarily occur before startElement is called with the document root
  • 18. ContentHandler methods, III • public void startDocument() throws SAXException – This is called just once, at the beginning of parsing • public void endDocument() throws SAXException – This is called just once, and is the last method called by the parser • Remember: when you override a method, you can throw fewer kinds of exceptions, but you can’t throw any new kinds – In other words: your methods don’t have to throw a SAXException – But if they do throw an exception, it can only be a SAXException
  • 19. ContentHandler methods, IV • public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException • This method is called at the beginning of every element • If the parser is namespace-aware, – namespaceURI will hold the prefix (before the colon) – localName will hold the element name (without a prefix) – qualifiedName will be the empty string • If the parser is not using namespaces, – namespaceURI and localName will be empty strings – qualifiedName will hold the element name (possibly with prefix)
  • 20. Attributes, I • When SAX calls startElement, it passes in a parameter of type Attributes • Attributes is an interface that defines a number of useful methods; here are a few of them: – – – – – getLength() returns the number of attributes getLocalName(index) returns the attribute’s local name getQName(index) returns the attribute’s qualified name getValue(index) returns the attribute’s value getType(index) returns the attribute’s type, which will be one of the Strings "CDATA", "ID", "IDREF", "IDREFS", "NMTOKEN", "NMTOKENS", "ENTITY", "ENTITIES", or "NOTATION” • As with elements, if the local name is the empty string, then the attribute’s name is in the qualified name
  • 21. Attributes, II • SAX does not guarantee that the attributes will be returned in the same order they are written – After all, the order is irrelevant in XML • The following methods look up attributes by name rather than by index: – public String getValue(String qualifiedName) – public String getValue(String uri, String localName) • However, some methods of Attributes require an index, so: – public int getIndex(String qualifiedName) – public int getIndex(String uri, String localName) • An Attributes object is valid only during the call to startElement – If you need to remember attributes longer, use: AttributesImpl attrImpl = new AttributesImpl(attributes);
  • 22. ContentHandler methods, V • endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException • The parameters to endElement are the same as those to startElement, except that the Attributes parameter is omitted
  • 23. ContentHandler methods, VI • public void characters(char[] ch, int start, int length) throws SAXException • ch is an array of characters – Only length characters, starting from ch[start], are the contents of the element • The String constructor new String(ch, start, length) is an easy way to extract the relevant characters from the char array • characters may be called multiple times for one element – Newlines and entities may break the data characters into separate calls – characters may be called with length = 0 – All data characters of the element will eventually be given to characters
  • 24. Example • If hello.xml contains: – <?xml version="1.0"?> <display> Hello World! </display> • Then the sample program we started with gives: – startElement: display characters: characters: characters: characters: <-- zero length string <-- LF character (ASCII 10) Hello World! <-- spaces are preserved <-- LF character (ASCII 10) Element: /display
  • 25. Whitespace • Whitespace is a major nuisance – Whitespace is characters; characters are PCDATA – IF you are validating, the parser will ignore whitespace where PCDATA is not allowed by the DTD – If you are not validating, the parser cannot ignore whitespace – If you ignore whitespace, you lose your indentation • To ignore whitespace when validating: – Happens automatically • To ignore whitespace when not validating: – Use the String function trim() to remove whitespace – Check the result to see if it is the empty string
  • 26. Handling ignorable whitespace • A nonvalidating parser cannot ignore whitespace, because it cannot distinguish it from real data • A validating parser can, and does, ignore whitespace where character data is not allowed – For processing XML, this is usually what you want – However, if you are manipulating and writing out XML, discarding whitespace ruins your indentation – To capture ignorable whitespace, you can override this method (defined in DefaultHandler): public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException • Parameters are the same as those for characters
  • 27. Error Handling, I • SAX error handling is unusual • Errors are ignored unless you register an error handler (org.xml.sax.ErrorHandler) – Ignored errors can cause bizarre behavior – Failing to provide an error handler is unwise • The ErrorHandler interface declares: – public void fatalError (SAXParseException exception) throws SAXException // XML not well structured – public void error (SAXParseException exception) throws SAXException // XML validation error – public void warning (SAXParseException exception) throws SAXException // minor problem
  • 28. Error Handling, II • If you are extending DefaultHandler, it implements ErrorHandler and registers itself – DefaultHandler’s version of fatalError() throws a SAXException, but... – its error() and warning() methods do nothing! • You can (and should) override these methods • Note that the only kind of exception your override methods can throw is a SAXException – When you override a method, you cannot add exception types – If you need to throw another kind of exception, say an IOException, you can encapsulate it in a SAXException: • catch (IOException ioException) { throw new SAXException("I/O error: ", ioException) }
  • 29. Error Handling, III • If you are not extending DefaultHandler: – Create a new class (say, MyErrorHandler) that implements ErrorHandler (by supplying the three methods fatalError, error, and warning) – Create a new object of this class – Tell your XMLReader object about it by sending it the following message: setErrorHandler(ErrorHandler handler) • Example: XMLReader parser = saxParser.getXMLReader(); parser.setErrorHandler(new MyErrorHandler());