XML: Extensible Mark-up Language T.Srinath Vidya Sagar Seema
XML : XML is a standard technology protocol proposed by W3C.Current version of XML is 1.0. XML stands for Extensible Mark Up Language .XML is a platform independent, language independent firewall friendly database.. The XML is text representation of data between mark ups. XML document’s extension is .xml.
XML document are of two types. 1.Well formed XML documents. 2.Validated XML documents.
1.Well-formed XML Document: Any XML document is called a well-formed XML documents if it satisfies the following rules *It should contain only one root element. *Tag sensitive. *case sensitive.
2.Validated XML Documents : If the XML documents is satisfying all the rules specified in the referred DTD , then it is called as validated XML documents.
DTD :(Document Type Definition) This specification is used to validate and to define the rules for the XML documents.
Rules of DTD : *What elements an XML document can contain. *Number of occurrences of an element. *Sequence of elements. *The attributes of an element. *Custom entities which can be used in XML document.
DTDs can be of 2 types : Internal DTDs-If the XML document contains the DTD inside in it, then it is called as internal DTD. External DTDs-If the XML document doesn't contains the DTD inside it but the DTD is referred from other DTD file the extension for DTD file is .dtd.
What is parsing? Parsing is the process of reading an XML document and reporting its content to a client application while checking the document for well-formedness. The specific class that implements this interface varies from parser to parser.
Purpose of parsers : *The processor must check the basic syntax of the document for well-formedness. *the processor must replace all references to entities in an XML document by their definitions.
*DTDs and XML schemas can specify that certain values in an XML document during processing. *When a DTD or an XML schema is specified and the processor includes a validating parser,the structure of the XML document must be checked to ensure that its structure is legitimate.
*Parsers allow interpretation of XML documents. *Parsers allow processing of XML data in the XML documents. *To build XML parsers, we can use either DOM specification or SAX specification.
*DOM specification is given by W3C, SAX is given by SUN Micro Systems. *All the parsers that are built according to the DOM specification are called as DOM parsers, and the parsers that are built according to the SAX specification are called as SAX parsers.
DOM Parser (Document Object Model) : The XML DOM defines a standard way for accessing and manipulating XML document the DOM presents an XML document as a tree structure. The DOM is a W3C standard ie “World Wide Web Consortium”.
Structure of the DOM tree : *The DOM tree is composed of Node objects. *Node is an interface -some of the more important sub interfaces are Element,Attr,and Text. -An element node may have children. -Attribute and Text nodes are leaves. -Additional types are Document, comment,Entity,CDATA,Processing.
*The XML DOM uses an XML doc as tree structure the tree structure is called a node tree. All nodes can be accessed through the tree. The contents can be modified or deleted and new elements can be created. *The node tree shows the set of nodes and the connection between them. The tree starts at root node.
*The tree starts at root node and branches out to the text nodes at the lowest level of tree. In a node tree the top node is called the root -Every node except the root has exactly one parent node. -A node can have any number of children. -A leaf is node with no children. -Siblings are nodes with a same parent.
Usage:- <ul><li>This is used for create events; and the </li></ul><ul><li>Content is send from xml document to </li></ul><ul><li>The xml parser </li></ul>Performance:- <ul><li>SAX provides higher performance than </li></ul><ul><li>DOM parser. because in DOM stores the </li></ul><ul><li>entire xml document into memory before </li></ul><ul><li>Processing .but in case of SAX it parses </li></ul><ul><li>Node by node. </li></ul>
<ul><li>Accessibility :- </li></ul><ul><li>It is a serial access parser that is used for </li></ul><ul><li>reading the data from an xml document </li></ul><ul><li>(or) </li></ul><ul><li>parsing the xml document </li></ul><ul><li>SAX parser is unidirectional ,means once </li></ul><ul><li>SAX passes the data ,it cannot be read </li></ul><ul><li>again until the parsing process is restarted. </li></ul>
Events generated by SAX:- <ul><li>Xml element nodes </li></ul><ul><li>Xml text nodes </li></ul><ul><li>Xml processing instructions </li></ul><ul><li>Xml comments </li></ul>Example to demonstrate parsing of xml document throw SAX parser: <?xml version=“1.0”?> <root element param1=24> <first element>hello</first element> <second element>world</second element> </root element>
<ul><li>After parsing ,following events are generated. </li></ul><ul><li>Xml processing instruction called xml with </li></ul><ul><li>version as attribute equal to “1.0”. </li></ul><ul><li>2.Xml element start tag called<root element> </li></ul><ul><li>along with param1 as it’s attribute .the </li></ul><ul><li>value of param1 is 24 </li></ul><ul><li>3. Xml element start tag called <first element> </li></ul><ul><li>4.Xml text which is ‘hello’. </li></ul>
5.Xml element end tag </first element> 6. Xml element start tag called <second element> 7.Xml text is ‘world’. 8.Xml element end tag</second element> 9.Xml element end tag </root element>
Advantages :- <ul><li>Reduced Memory and CPU Usage: </li></ul><ul><li>SAX reduces memory and CPU usage because it only processes one section of an XML document at a time. </li></ul><ul><li>Streamlined and fast : </li></ul><ul><li>SAX is streamlined, fast, and supports pipelining. This means that the parser can produce output while the document is being parsed. </li></ul>
Disadvantages <ul><li>Uni-directional parsing : </li></ul><ul><li>The SAX parser is uni-directional. It can only parse forwards in a document, which means some forms of navigation, including certain XPath expressions, cannot be achieved using SAX. DOM should be used in these situations. </li></ul>
No structure manipulation : Because only a portion of the XML document is in memory at any one time, it is difficult to add or edit nodes using SAX. If this functionality is required, then DOM should be considered.
Only works with fully formed XML documents : SAX can only work with a fully formed XML document. It cannot be used to process partial XML documents. <ul><li>No random access to the document : </li></ul><ul><li>Because the document is not in memory, you must handle data in the order in which it is processed. </li></ul>
<ul><li>No SAX implementation in current browsers : SAX support is not built into Microsoft® Internet Explorer. </li></ul>
When Should we use SAX <ul><li>When your documents are large : </li></ul><ul><li>Perhaps the biggest advantage of SAX is that it requires significantly less memory to process an XML document than the DOM. With SAX, memory consumption does not increase with the size of the file. </li></ul>
<ul><li>When you need to abort parsing : </li></ul><ul><li>Because SAX allows you to abort processing at any time, you can use it to create applications that fetch particular data. </li></ul><ul><li>When you want to retrieve small amounts of information : For many XML-based solutions, it is not necessary to read the entire document to achieve the desired results. Scanning only a small percentage of the document results in a significant savings of system resources. </li></ul>
<ul><li>When you want to create a new document structure : In some cases, you might want to use SAX to create a data structure using only high-level objects, such as stock symbols and news, and then combine the data from this XML file with other news sources. Rather than build a DOM structure with low-level elements, attributes, and processing instructions, you can build the document structure more efficiently and quickly using SAX. </li></ul>
Difference between SAX and DOM SAX DOM 1.Import javax.xml.parser.*; import org.xml.sax.*; import org.xml.sax.helper.*; 1.Import javax.xml.parser.*; import org.w3c.dom.*; 2.Parses node by node. 2.Storesthe entire xml document into memory before processing. 3.Does not store the xml in memory. 3.Occupies more memory. 4.We can’t insert or delete a node. 4. 4.We can insert or delete a node. 5.Top to bottom traversing. 5.Traverse in any direction.
6.SAX is an event based parser. 6.DOM is a tree model parser. 7.SAX doesn’t preserve comments. 7.DOM preserves comments. 8.SAX parser serves the client applications always only with pieces of the document. 8.DOM parser always serves the client application with the entire document.