Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Basics of XML


Published on

XML is everywhere. Computers, Mobiles, Bank Systems, Internet, TVs, Microwaves, all use XML as an Information Wrapping and Information Xchange System. We will tell you all the basics in a simplest possible way.

Published in: Technology
  • Be the first to comment

Basics of XML

  1. 1. A Programme Under the compumitra Series Copyright 2010-14 © Sunmitra Education Technologies Limited, India eXtensible Markup Language (XML) A comment by Tim Bray of Sun Microsystems on Celebration of 10th Anniversary of XML in Feb 2008. "There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes. This is a good thing, because it shows that information can be packaged and transmitted and used in a way that's independent of the kinds of computer and software that are involved. XML won't be the last neutral information wrapping system; but as the first, it's done very well."
  2. 2. Outline  XML Eye-opener.  What is XML?  HTML vs. XML.  Basic XML Syntax.  Constituents.  Some XML Rules.  Element Vs. Attribute.  Node Naming Principles.  Advanced Concepts related to XML  Future of XML
  3. 3. XML Eye Opener  SIMPLE: So simple that you would wonder, why you were not trying to understand it till date.  SUCCESSFUL: Most successful data storage format till date that even big brand who were strong believers of proprietary formats for commercial reasons have started using it.  SOLID: Most solid ageless concept that this generation will pass-on to other future generations and they will keep the baton moving.
  4. 4. What is XML-1  XML is abbreviation of eXtensible Markup Language.  XML evolved from more general purpose ISO standard SGML (Standard Generalised Markup Language).  All Data needs Description to make it some useful Information. XML provides a neat solution.  XML looks like normal English but it has been designed to be machine readable.
  5. 5. What is XML-2  XML can store data  XML can help standardization in exchange of data.  User defined markup tags to name dataitems.  Library Functions are available in most programming languages to parse XML.  The syntax looks like <addressbook> <adrrecord> <name>Name1</name> <address>Address1</address> <city>City1</city> </adrrecord> </addressbook>
  6. 6. Understanding Basic XML Syntax <?xml version="1.0" encoding="UTF-8" standalone="no"?> <COUNTRYLIST> <COUNTRY group="G20"> <NAME>India</NAME> <CODE>IN</CODE> <ISD>91</ISD> <CAPITAL largestcity="No">New Delhi</CAPITAL> <LCITY>Mumbai</LCITY> <CURRENCY>Indian Rupee</CURRENCY> <CURCODE>INR</CURCODE> </COUNTRY> <COUNTRY group="G5"> <NAME>Japan</NAME> <CODE>JP</CODE> <ISD>81</ISD> <CAPITAL largestcity="Yes">Tokyo</CAPITAL> <LCITY>Tokyo</LCITY> <CURRENCY>Yen</CURRENCY> <CURCODE>JPY</CURCODE> </COUNTRY> </COUNTRYLIST> Element Node XML Declarations: Version: of XML Encoding: Character-set Used. UTF-8 is common (unicode 8 bit variant) Standalone=Yes, depicts non-usage of external type definitions Attribute Node Root Element Node Element Value Attribute Value
  7. 7. XML Constituents  Elements <address><name>somename</name></address>  Attributes <Book Version="1.0"><name></name></Book>  Five predefined Entities to allow for special charaters in the PCDATA area. > to &gt; < to &lt; & to &amp; ' to &apos; " to &quot;  CDATA section (Character Data Not to be parsed). This is meant for putting lot of code like or general purpose data. Even HTML data can be put here. <![CDATA[ ... ]]>  Processing Instructions (PI) or Directives given betweem <? ?> <?xml-stylesheet type="text/css" href="mySheet.css"?> or even initial declaration like below is a PI <?xml version="1.0" encoding="UTF-8" standalone="no"?> Parsable Character data (PCDATA) between element <address> start and end tags. Attribute has a name and a value in quotes.
  8. 8. Some XML Rules - 1  All elements to have closing tags. <address>invalid syntax <address>valid syntax</address>  All elements are case sensitive. <Name>incorrect</name> <Name>correct</Name>  Elements shall be correctly nested. <address><name>incorrect</address></name > <address><name>correct</name></address>  Attribute values must be quoted. <Book Version=1.0><name></name></Book> (Incorrect) <Book Version="1.0"><name></name></Book> (correct)
  9. 9. Some XML Rules - 2  XML Document must have a root element and only one root element (it can have any name though). <root> <Child>correct</child> </root>  Entities in data values must use special codes. > as &gt; < as &lt; & as &amp; ' as &apos; " as &quot;  Comments has this syntax. <!– This is a comment --> Comments can not contain – in its text matter.  Whitespace are preserved as against HTML. For e.g. "Hello World" in HTML would be "Hello World". In XML it will retain exact spaces specified.  Empty Elements have this kind of optional format. <Name />
  10. 10. Some XML Rules - 3  Whitespace are preserved as against HTML. For e.g. "Hello World" in HTML would be "Hello World". In XML it will retain exact spaces specified.  The optional style of writing empty elements is. <Name /> in place of <Name></Name>
  11. 11. XML Practice: Element Vs Attributes - 1  It is generally possible to define all data as ELEMENT tags in a tree format. <Library> <Book> <ID>201</ID> <ISBN>8175257660</ISBN> <Author>Name1</Author> <Title>Book Title</Title> </Book> </Library>  A neat alternative to above could be using ATTRIBUTES as follows: <Library> <Book ID="201" ISBN="8175257660"> <Author>Name1</Author> <Title>Book Title</Title> </Book> </Library>
  12. 12. XML Practice: Element Vs Attributes -2  Which method to use is a thoughtful decision.  Information that is surely singular (will not be repeated) and is not domain specific is recommended as ATTRIBUTE.  If you are unable to classify or the Information can be repeated (For e.g. Author tag can be repeated in above example) should be used as ELEMENT.  Even better format for previous example would be <Library> <Book ID="201"> <ISBN>8175257660</ISBN> <Author>Name1</Author> <Title>Book Title</Title> </Book> </Library> This is because ISBN is a book related property while ID may be related to a storage place.
  13. 13. XML Node Naming – Begins with  Node (elements or attributes) names shall begin with a letter or _ (underscore). <1STLINE></1STLINE> invalid element naming <LINE1></LINE1> valid naming <BOOK 1Ver="1.00"></BOOK> invalid attribute naming <BOOK _Ver="1.00"></BOOK> valid attribute naming
  14. 14. XML Node Naming – Consists of  Name can consist of  Any English Character or even any foreign language character as allowed by the encoding set given in the declaration. <Name>Sun</Name> <नाम>सूरज</नाम>  A dot (.) or hyphen (-) or _(undescore) <Address.Cityname>Delhi</Address.Cityname> <Address-Cityname>Delhi</Address-Cityname> <Address_Cityname>Delhi</Address_Cityname> Tabs and Spaces are not allowed in XML Node Names.
  15. 15. XML Node Naming – Based on Namespace  Name can belong to a namespace  Table may be used in html or furniture. One can resolve this problem by using namespaces as follows <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table> <f:name>Dining Table</f:name> <f:width>120</f:width> <f:length>230</f:length> </f:table>
  16. 16. HTML Vs XML - 1  Similarities. Both Uses markup tags (elements and attributes) e.g. <H1>Heading1</H1> or <font face="Verdana"></font>. Both use entities e.g. &lt; &gt; etc. Both are derived from SGML
  17. 17. HTML Vs XML - 2  Differences. HTML has predefined tags, XML tags are user defined. HTML is for Humans and errors are ignored. XML is for computers as data storehouse or definitions so errors can not be ignored. HTML is usually not updated by programs while XML is meant for program based writing. HTML has large number of entities. XML has just five.
  18. 18. XSL (Extensible Stylesheet Language)  Unlike HTML styling using CSS (Cascade Style Sheet) it has tags that are user defined.  It has three parts XSLT (XSL Transformation): for showing XML data as transformed XHTML onto a webpage. Xpath: a way to reach a particular data-item in an XML file. This is very often useful in reading XML based configuration files. XSL-FO (XSL Formatting Objects): Provides a display/print formatting mechanism for XML data.
  19. 19. DTD (Document Type Definition)  A DTD is referred within a DOCTYPE declaration in an XML file such as. <!DOCTYPE note SYSTEM "Note.dtd">  This DTD file will have the format as follows.<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> XML file has the root node named note with four sub- elements. The sub- elements have the PCDATA format.
  20. 20. Parsing XML  Process of reading XML file and extracting valid data out of it is called "PARSING".  Parsers are of two types Non-Validating Parser: When the document doesn't check against a validating DTD. Validating Parser: When a document is checked against its DTD
  21. 21. Some Advanced Concepts Related to XML  XML Schema: Relates to defining validation rules in form of XSD (XML Schema Definition) files that too are in the XML format.  XQuery: This is a way to search within an XML file and get the selected nodes that match the criteria.
  22. 22. Where to View/Edit  Browsers: Most Browsers are good at viewing XML. Internet Explorer is particularly good at it.  Editors: Special Editors are available that allow good XML views/editing facilities. Microsoft's XML Editor, Peter's XML editor are good at it.  Office Tools: MS-Word, Frontpage like tools provide good XML Editing. Even MS-Excel support XML file opening.  Visual Studio/WebDeveloper: They provide excellent environment for XML editing and viewing along with validation support.
  23. 23. Let's Quickly Revise  2 Types of Nodes: Elements and Attributes. Elements are repeatable. Attributes can always be put up like elements, reverse may not be true.  Special syntax for non-parsable data as CDATA.  5 Entities for special symbols( <, >, ', ", &).  HTML style Comments Allowed. <!-- comments -- >  Case-Sensitive. Closing Required  One can apply other Processing Instructions (PI) that is enclosed with in <? ?>. First line is usually a Version declaration line which is also a PI.  Always have a single root node.
  24. 24. Future of XML  All websites may one day be written in XML. HTML has already been re-standardised as XHTML which provides better syntax checking and browser compatibility.  XML promises to be the most open system for storage of information from all IT gadgets like Desktops to Mobile phones to ipods to ipads to DVD players to microwave-ovens etc. It is already being used and it is expected to be used in more and more devices.  All office documents/e-books offline and online shall ultimately be in XML as it is the sole non- proprietary format that is simple and is able to meet the needs well.
  25. 25.  Ask and guide me at  Share this information with as many people as possible.  Keep visiting for programme updates.