Service Oriented Architecture - Unit II - Sax

SAX
Dr.S.Roselin Mary
HOD/CSE
ANAND INSTITUTE OF HIGHER TECHNOLOGY

• SAX (Simple API for XML) is an event-based parser for XML
documents.
• Unlike a DOM parser, a SAX parser creates no parse tree.
• SAX is a streaming interface for XML, which means that
applications using SAX receive event notifications about the
XML document being processed an element, and attribute,
at a time in sequential order starting at the top of the
document, and ending with the closing of the ROOT
element.
– Reads an XML document from top to bottom, recognizing the
tokens that make up a well-formed XML document.
– Tokens are processed in the same order that they appear in the
document.
– Reports the application program the nature of tokens that the
parser has encountered as they occur.
– The application program provides an "event" handler that must
be registered with the parser.
– As the tokens are identified, callback methods in the handler are
invoked with the relevant information.7/16/2019 2Dr.S.ROSELIN MARY HOD/CSE , ANAND INSTITUTE OF HIGHER TECHNOLOGY

Why Do I Need SAX?
• to pull out the text from a document or to look for attributes of
specific tags, we might be able to do some of the work using a tool
or maybe XSLT, but these solutions have their limitations.
• writing a tool or a standalone program to process XML, SAX is a
good way to do it.
• Many applications today can be customized using an XML file.
These files have replaced the traditional “properties” files for
reasons of uniformity and richness of expression.
• Instead of spending a lot of your time writing a parser to read XML
files, you might as well use SAX.
• SAX is completely free, so it can be embedded in a larger
application without royalty fees or even copyright notices.
• Some SAX parsers can validate a document against a Document
Type Definition (DTD).
• Validating parsers can also tell specifically where validation has
failed.
7/16/2019 3Dr.S.ROSELIN MARY HOD/CSE , ANAND INSTITUTE OF HIGHER TECHNOLOGY

SAX Parser
• When should I use it?
– Large documents
– Memory constrained devices
• When should I use something else?
– If you need to modify the document
– SAX doesn’t remember previous events unless you
write explicit code to do so.

SAX Parser
• Which languages are supported?
– Java
– Perl
– C++
– Python

SAX Parser
• Versions
– SAX 1 introduced in May 1998
– SAX 2.0 introduced in May 2000 and adds support
for
• namespaces
• filter chains
• querying and setting properties in the parser

SAX Parser
• Some popular SAX APIs
– Apache XML Project Xerces Java Parser
http://xml.apache.org/xerces-j/index.html
– IBM’s XML for Java (XML4J)
http://www.alphaworks.ibm.com/formula/xml
– For a complete list, see
http://www.megginson.com/SAX

DOM SAX
Tree model parser (Object based) (Tree of
nodes).
Event based parser (Sequence of events).
DOM loads the file into the memory and
then parse- the file
SAX parses the file as it reads it, i.e. parses
node by node.
Has memory constraints since it loads the
whole XML file before parsing.
No memory constraints as it does not
store the XML content in the memory.
DOM is read and write (can insert or
delete nodes).
SAX is read only i.e. can’t insert or delete
the node
If the XML content is small, then prefer
DOM parser.
Use SAX parser when memory content is
large.
Backward and forward search is possible
for searching the tags and evaluation of
the information inside the tags. So this
gives the ease of navigation.
SAX reads the XML file from top to
bottom and backward navigation is not
possible.
Slower at run time. Faster at run time.

SAX Basics
• build a content handler by creating a Java
class that implements the ContentHandler
interface in the org.xml.sax package.
• Once we have a content handler, simply
register it with a SAX XMLReader.
• set up the input source, and start the parser.
• the methods in our content handler will be
called when the parser encounters elements,
text, and other data.

<?xml version=”1.0” encoding=”UTF-8”?>
<fiction>
<book author=”Herman Melville”>Moby Dick</book>
</fiction>
Generated Events:
start document
start element: fiction
start element: book (including attributes)
characters: Moby Dick
end element: book
end element: fiction
end document

SAX Packages
• The SAX 2.0 API consists of two standard
packages and one extension package.
• The standard packages
– org.xml.sax
– org.xml.helpers.
• The org.xml.sax package contains the basic
classes, interfaces, and exceptions needed for
parsing documents.

The org.xml.sax Package
Name Description
AttributeList This interface has been replaced by the SAX2
Attributes interface, which includes namespace support.
Attributes Interface for a list of XML attributes.
ContentHandler Receives notification of the logical content of a document.
DocumentHandler Deprecated. This interface has been replaced by the SAX2
ContentHandler interface, which includes namespace support.
DTDHandler Receives notification of basic DTD-related events.
EntityResolver Basic interface for resolving entities.
ErrorHandler Basic interface for SAX error handlers.
Locator Interface for associating a SAX event with a document Location.
Parser Deprecated. This interface has been replaced by the SAX2
XMLReader interface, which includes namespace support.
XMLFilter Interface for an XML filter.
XMLReader Interface for reading an XML document using callbacks.
Interfaces
7/16/2019 12
Dr.S.ROSELIN MARY HOD/CSE , ANAND INSTITUTE OF HIGHER TECHNOLOGY

HandlerBase Deprecated. This class works with the deprecated
DocumentHandler interface.
InputSource A single input source for an XML entity.
SAXException Encapsulates a general SAX error or warning.
SAXNotRecognizedException Exception class for an unrecognized identifier.
SAXNotSupportedException Exception class for an unsupported operation.
SAXParseException Encapsulates an XML parse error or warning.
Classes
Exceptions

ContentHandler Interface
This interface specifies the callback methods that the SAX parser uses to notify an
application program of the components of the XML document that it has seen.
void startDocument() − Called at the beginning of a document.
void endDocument() − Called at the end of a document.
void startElement(String uri, String localName, String qName, Attributes atts) − Called at
the beginning of an element.
void endElement(String uri, String localName,String qName) − Called at the end of an
element.
void characters(char[] ch, int start, int length) − Called when character data is
encountered.
void ignorableWhitespace( char[] ch, int start, int length) − Called when a DTD is present
and ignorable whitespace is encountered.
void processingInstruction(String target, String data) − Called when a processing
instruction is recognized.
void setDocumentLocator(Locator locator)) − Provides a Locator that can be used to
identify positions in the document.
void skippedEntity(String name) − Called when an unresolved entity is encountered.
void startPrefixMapping(String prefix, String uri) − Called when a new namespace
mapping is defined.
void endPrefixMapping(String prefix) − Called when a namespace definition ends its scope.

What is SAX Parsing?
• Callback Methods
– The SAX API has a default handler class built in so
you don’t have to re-implement the interfaces every
time (org.xml.sax.helpers.DefaultHandler)
– The five most common methods to override are:
• startElement(String uri, String lname, String qname,
Attributes atts)
• endDocument(String uri, String lname, String qname)
• characters(char text[], int start, int length)
• startDocument()
• endDocument()

• startElement()
– Four parameters:
• String uri = the namespace URI (Uniform Resource Identifier)
• String lname = the local name of the element
• String qname = the qualified name of the element
• Attributes atts = list of attributes for this element
– If the current element is a complex element, an object of
the appropriate type is created and pushed on to the stack
– If the element is simple, a StringBuffer is pushed on to the
stack, ready to accept character data

• endElement()
– Three parameters:
• String uri = the namespace URI (Uniform Resource Identifier)
• String lname = the local name of the element
• String qname = the qualified name of the element
– The topmost element on the stack is popped,
converted to the proper type, and inserted into its
parent, which now occupies the top of the stack
(unless this is the root element – special handling
required)

• characters()
– Three parameters:
• char text[] = character array containing the entire XML
document
• int start = starting index of current data in text[]
• int length = ending index of current data in text[]
– When the parser encounters raw text, it passes a char
array containing the actual data, the starting position,
and the length of data to be read from the array
– The implementation of the callback method inserts
the data into the StringBuffer located on the top of the
stack
– Can lead to confusion because of:
• No guarantee that a single stretch of characters results in
one call to characters()
• It stores all characters, including whitespace, encountered by
the parser

Attributes Interface
This interface specifies methods for processing
the attributes connected to an element.
• int getLength() − Returns number of attributes.
• String getQName(int index)
• String getValue(int index)
• String getValue(String qname)

• The org.xml.sax.helpers package contains
additional classes that can simplify some of
coding and make it more portable.
• We can find a number of adapters that
implement many of the handler interfaces, so
you don’t need to fill in all the methods
defined in the interfaces.

• The org.xml.sax.ext package is an extension
that is not shipped with all implementations.
• It contains two handler interfaces for
capturing declaration and lexical events

To list all the student elements details including attribute & Child
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAXParserDemo
{
public static void main(String[] args)
{
try
{
File inputFile = new File(“src/students.xml");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
}
catch (Exception e) { e.printStackTrace(); }
}
}
7/16/2019 23

class UserHandler extends DefaultHandler
{
boolean bname = false;
boolean bgender = false;
boolean bmarks = false;
@Override
public void startElement( String uri, String localName, String qName,
Attributes attributes)
throws SAXException
{
if (qName.equalsIgnoreCase("student"))
{
String rollNo = attributes.getValue(“id");
System.out.println(“id : " + rollNo);
}
else if (qName.equalsIgnoreCase("name"))
{
bname = true;
}
else if (qName.equalsIgnoreCase(“gender"))
{
bgender = true;
}
else if (qName.equalsIgnoreCase(“marks"))
{
bmarks = true;
7/16/2019 24

@Override
public void endElement(String uri, String localName, String qName)
throws SAXException
{
{ System.out.println("End Element :" + qName); }
}
@Override
public void characters(char ch[], int start, int length)
throws SAXException
{
if (bname)
{
System.out.println("Name: " + new String(ch, start, length));
bname = false;
}
else if (bgender)
{
System.out.println(“Gender: " + new String(ch, start, length));
bgender = false;
}
else if (bmarks) {
System.out.println("Marks: " + new String(ch, start, length));
bmarks = false;
}7/16/2019 25

SAXQueryDemo.java
package com.tutorialspoint.xml;
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAXQueryDemo
{
public static void main(String[] args)
{
try
{
File inputFile = new File("src/students.xml");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
}
catch (Exception e) { e.printStackTrace(); }
}
}7/16/2019 26Dr.S.ROSELIN MARY HOD/CSE , ANAND INSTITUTE OF HIGHER TECHNOLOGY

class UserHandler extends DefaultHandler
{
boolean bname = false;
boolean bgender = false;
boolean bmarks = false;
String rollNo = null;
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException
{
{ rollNo = attributes.getValue("id");}
if(("001").equals(rollNo) && qName.equalsIgnoreCase("student"))
{ System.out.println("Start Element :" + qName); }
if (qName.equalsIgnoreCase("name")) { bname = true;}
else if (qName.equalsIgnoreCase("gender")) { bgender = true;}
else if (qName.equalsIgnoreCase("marks")) { bmarks = true; }
}

@Override
public void endElement(String uri, String localName, String qName)
throws SAXException
{
{
if(("001").equals(rollNo) && qName.equalsIgnoreCase("student"))
System.out.println("End Element :" + qName);
}
}
@Override
public void characters(char ch[], int start, int length)
throws SAXException
{
if (bname && ("001").equals(rollNo))
{
System.out.println(" Name: " + new String(ch, start, length));
bname = false;
}
else if (bgender && ("001").equals(rollNo))
{
System.out.println("Gender: " + new String(ch, start, length));
bgender = false;
}
else if (bmarks && ("001").equals(rollNo))
{
System.out.println("Marks: " + new String(ch, start, length));

Validation
• SAX parsers come in two varieties: validating and
nonvalidating.
– Validating parsers can determine whether an XML document is
valid based on a Document Type Definition (DTD) or Schema.
– The SAX parser shipped with Apache Xerces is a validating
parser.
• In order to use validation, we must turn it on by setting the
validation feature to true.
• If we try to turn on validation with a nonvalidating parser, a
SAXNotSupportedException will bethrown.
• If the parser does not recognize the feature, a
SAXNotRecognizedException will be thrown. This helps in
determining whether we mistyped the feature name.

• ErrorHandler contains three methods that can
be used to determine whether a document is
well formed and valid.
• Either error() or warning() will be called if the
document is well formed but not valid(that is,
it violates the rules of the DTD), and
fatalError() will be called if the document is
not well formed.

Validator
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
public class SAXValidator extends DefaultHandler
{
private boolean valid;
private boolean wellFormed;
public SAXValidator() { valid = true; wellFormed = true;}
public void startDocument() { System.out.println(“***Start of Document***”);}
public void endDocument() {System.out.println(“***End of Document***”);}
public void error(SAXParseException e) { valid = false;}
public void fatalError(SAXParseException e) {wellFormed = false;}

public void warning(SAXParseException e) {valid = false;}
public boolean isValid() {return valid;}
public boolean isWellFormed() { return wellFormed;}
public static void main(String args[]) throws Exception
{
if (args.length != 1)
{
System.err.println(“Usage: java SAXValidate <xml-file>”);
System.exit(1);
}
XMLReader parser =
XMLReaderFactory.createXMLReader(“org.apache.xerces.parsers.SAXParser”);
parser.setFeature(“http://xml.org/sax/features/validation”, true);

SAXValidator handler = new SAXValidator();
parser.setContentHandler(handler);
parser.setErrorHandler(handler);
parser.parse(new InputSource(new FileReader(args[0])));
if (!handler.isWellFormed()) {System.out.println(“Document is NOT well formed.”);}
if (!handler.isValid()) { System.out.println(“Document is NOT valid.”);}
if (handler.isWellFormed() && handler.isValid())
{ System.out.println(“Document is well formed and valid.”);}
}
}

Lexical Events
• We can receive the events comments, CDATA,
and DTD references using an extension
interface called LexicalHandler.
• LexicalHandler is part of the org.xml.sax.ext
package, which is not necessarily supported
by all SAX implementations.
• Xerces provides support for the extension
package.

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.*;
public class SAXLexical extends DefaultHandler implements LexicalHandler
{
public SAXLexical() { }
public void startDocument() { System.out.println(“***Start of Document***”); }
public void endDocument() { System.out.println(“***End of Document***”); }
public void startElement(String uri,String localName,String qName,Attributes attribute
{
System.out.print(“<” + qName);
int n = attributes.getLength();
for (int i=0; i<n; i+=1)
{ System.out.print(“ “ + attributes.getQName(i) +“=’” + attributes.getValue(i) + “‘“);}
System.out.println(“>”);
}
Lexical Event
7/16/2019 36

public void characters(char[] ch, int start, int length)
{ System.out.println(new String(ch, start, length).trim()); }
public void endElement(String namespaceURI, String localName,String qName)
throws SAXException
{ System.out.println(“</” + qName + “>”); }
public void startDTD(String name, String publicId, String systemId)
throws SAXException
{
System.out.print(“*** Start DTD, name “ + name);
if (publicId != null) { System.out.print(“ PUBLIC “ + publicId); }
if (systemId != null) { System.out.print(“ SYSTEM “ + systemId); }
System.out.println(“ ***”);
}
public void endDTD() throws SAXException {
System.out.println(“*** End DTD ***”);
}
public void startEntity(String name) throws SAXException {
System.out.println(“*** Start Entity “ + name + “ ***”);
} 7/16/2019 37
Dr.S.ROSELIN MARY HOD/CSE , ANAND
INSTITUTE OF HIGHER TECHNOLOGY

public void endEntity(String name) throws SAXException
{ System.out.println(“*** End Entity “ + name + “ ***”); }
public void startCDATA() throws SAXException
{ System.out.println(“*** Start CDATA ***”);}
public void endCDATA() throws SAXException
{ System.out.println(“*** End CDATA ***”); }
public void comment(char[] ch, int start, int length) throws SAXException
{
System.out.println(“<!— “ +new String(ch, start, length) + “ —>”);
}
public static void main(String args[]) throws Exception
{
if (args.length != 1)
{
System.err.println(“Usage: java SAXLexical <xml-file>”);
System.exit(1);
}
7/16/2019 38
Dr.S.ROSELIN MARY HOD/CSE , ANAND
INSTITUTE OF HIGHER TECHNOLOGY

Service Oriented Architecture - Unit II - Sax

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Service Oriented Architecture - Unit II - Sax

Similar to Service Oriented Architecture - Unit II - Sax (20)

More from Roselin Mary S

More from Roselin Mary S (9)

Recently uploaded

Recently uploaded (20)

Service Oriented Architecture - Unit II - Sax