Successfully reported this slideshow.



Published on

Introduction to XML. Lecture notes for TYBSc (Computer Science) and TYBSC (IT)

Published in: Education, Technology
  • Be the first to comment


  1. 1. XML XML1 What are the disadvantages of XML? • HTML lacks syntax checking • HTML lacks structure • HTML is not suitable for data interchange • HTML is not context aware – HTML does not allow us to describe the information content or the semantics of the document • HTML is not object-oriented • HTML is not re-usable • HTML is not extensible • HTML is suitable only for displaying content, not for “what” the content is about. • HTML has a few tags to describe the meaning of the text, such as <ADDRESS> • HTML is not flexible enough to markup wide variety of documents. HTML can describe only <HEAD> and <BODY>. It cannot describe abstracts, chapters, part, sections etc.2 What is XML? • XML – Extensible Markup Language o Extensible – capable of being extended. We can make our own elements/tags. o Markup – it is a way of adding information to the text indicating the logical components of a document • How is it different from HTML? o HTML was designed to display data o XML was designed to store, describe and transport data o XML separates data from HTML • XML is also a markup language like HTML • XML tags are not predefined – we must design our own tags. • XML is portable - It is easy to produce files that capture the rules of your markup and enable other programs to properly read or process your XML documents. • XML does not do anything like HTML. XML was created to structure, store, and transport information. • XML is not a replacement for HTML; they do different things.3 State the differences between HTML and XML. HTML XML 1 Designed to display data Designed to store and transport data between applications and databases. Transport here means that data can be exchanged between incompatible systems, over the Internet. 2 Focus is on how data looks Focus is on what data is 3 It has pre-defined tags such as <B>, No predefined tags; all tags must be defined by the <LI>, etc user. E.g., we can create tags such as <TO>, <FROM>, <BOOKNAME>, etc 4 HTML is used to display information XML is used to describe information 5 Every tag may not have a closing tag. Every tag must have a closing tag. 6 HTML is not case sensitive. XML is case sensitive 7 HTML is for humans XML is for computers4 What are the advantages of XML? OR What are the features of XML? • XML simplifies data sharing : Since XML data is stored in plain text format, data can be easilyProf. Mukesh N. Tekwani Page 1 of 11
  2. 2. XML shared among different hardware and software platforms. • XML separates data from HTML : To display dynamic data in HTML, the code must be rewritten each time the data changes. With XML, data can be stored in separate files so that whenever the data changes it is automatically displayed correctly. We have to design the HTML for layout only once. • XML simplifies data transport: Data can be easily exchanged between different platforms. • XML makes data more available o Since XML is independent of hardware, software and application, XML can make your data more available and useful. o Different applications can access your data in HTML pages • XML provides a means to package almost any type of information (binary, text, voice, video) for delivery to a receiving end. • Internationality: HTML relies heavily on ASCII which makes using foreign characters very difficult. XML uses Unicode so that many European and Asian languages are also handled easily5 What are the types of XML markup? There are 5 types of XML markup: Elements: 1. XML elements describe the meaning of the text they contain. 2. Elements occur in pairs with a start tag and end tag that enclose the text they markup. 3. Inside the start tag, a keyword indicates the meaning of the markup. The end tag contains the same key word with a forward slash (/). Both tags start with a less than sign and end with a greater than sign. <LETTER>……….</LETTER> 4. Some elements do not occur in pairs. These elements are said to be empty. The tag for the element ends /> e.g., <BR/> 5. Some elements take attributes that modify or expand on the meaning they impart to content they contain. Attributes are set equal to values enclosed between quotation marks. Entities: 1. In HTML we use entities such as &gt; &lt; &nbsp; etc. Entities in XML are very similar to entities in HTML. 2. Some characters have a special meaning in XML. E.g., If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. <message>if salary < 1000 then </message> 3. XML also enables us to use any Unicode character you want thus, producing documents in other languages other than English. 4. XML entities can be defined in your XML file or externally and you can incorporate the entities in your XML file. 5. To avoid this error, replace the "<" character with an entity reference: <message>if salary &lt; 1000 then</message> 6. The predefined entites in XML are: Entity Symbol Description &lt; < Less than &gt; > Greater than &amp; & Ampersand &apos; ‘ Apostrophe &quot; “ Quotation markPage 2 of 11
  3. 3. XML Comments: comments are same as HTML. <!-- --> . Processing instructions: Processing instructions (PIs) enable us to embed information to be passed to an application right in your XML document. <?name data> is the syntax. The name, or PI target, should be anything that the processing application will recognize. Targets with XML are reserved for standardization purposes. The data component of PI can be anything that the processing application understands. Ignored sections: In a mathematical expression it becomes necessary to use characters that are XML reserved. If you put them into a ignored section like this: <![CDATA[4 <3 is false.]]> the expression with the less than sign passes to the application. All ignored sections start with <![CDATA[ and end with ]]>6 Simple example of XML document: <?xml version="1.0" encoding="ISO-8859-1"?> <class_list> <student> <name>Anamika</name> <grade>A+</grade> </student> <student> <name>Veena</name> <grade>B+</grade> </student> </class_list> • The first line is the XML declaration. o It defines the XML version (1.0) o It gives the encoding used (ISO-8859-1 = Latin-1/West European character set) o The XML declaration is actually a processing instruction (PI) an it is identified by the ? At its start and end • The next line describes the root element of the document (like saying: "this document is a class_list“). Every XML document must have only one root element. The root element is like the parent element. All other elements must be completely enclosed within that element. In our example, the root element is <class_list> • In XML the non-empty element must consist of three things: a start tag, content (either text or other elements) and an end tag. The name that you use in the element start tag must exactly match (including case) the name you use in the end tag. • The next 2 lines describe child elements of the root (student, name, and grade) • And finally the last line defines the end of the root element: </class_list>. • XML documents can contain empty XML elements. Example, <banner source="topbanner.gif"/> <rule/>Prof. Mukesh N. Tekwani Page 3 of 11
  4. 4. XML <footer source="foot.gif"/> With empty elements, a close delimiter is used . /> or you can you can use a closing tag as follows: <empty_element></empty_element> Attributes: XML elements can have attributes. An attribute provides additional information about an element. Attributes provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but can be important to the software that wants to manipulate the element: <file type="gif">computer.gif</file>7 Describe the logical structure / tree structure of XML documents. There is a big difference between XML and HTML markup. With a few exceptions, most HTML tags perform functions related to how the content is displayed. XML markup, on the other hand, is meant to convey what the content means. Each XML document must have only one root element, and all other elements must be perfectly nested inside that element. Perfectly nested means, that if an element contains other elements, those elements must be completely enclosed within that element. If we sketch the structure of the elements in XML document, we obtain a tree structure. The root element <class_list> is at the top of the tree. All elements that are inside this element are neatly contained within each other. An XML document can contain only one root element, and no element can be either partially or completely outside this element. An element is a parent of the elements that it contains. The elements inside an element are called children. Elements that share the same parent element are called siblings. In our example <class_list> is the parent of all elements. <student> is the parent of <name>, <name> is a child of <student>, and <name> and <grade> are siblings. Each child element must be fully contained within its parent element. Sibling elements may not overlap. The arrangement of elements in XML is called its logical structure. Tree Structure: • XML documents form a tree structure. • XML documents must contain a root element. This element is "the parent" of all other elements. • The elements in an XML document form a document tree. The tree starts at the root and branches to the lowest level of the tree. • All elements can have sub elements (child elements) • <root> <child> <subchild>.....</subchild> </child> </root>Page 4 of 11
  5. 5. XML Example of tree structure: This tree structure is a represenattion for one book in the XML document which is given below: <bookstore> <book category = "COOKING"> <title lang = "en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category = "CHILDREN"> <title lang = "en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category = "WEB"> <title lang = "en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore> The <book> element has 4 children: <title>, < author>, <year>, and <price>Prof. Mukesh N. Tekwani Page 5 of 11
  6. 6. XML8 State the XML syntax Rules All XML Elements Must Have a Closing Tag In HTML, elements do not have to have a closing tag: <p>This is a paragraph <p>This is another paragraph In XML, it is illegal to omit the closing tag. All elements must have a closing tag: <p>This is a paragraph</p> <p>This is another paragraph</p> XML Tags are Case Sensitive XML tags are case sensitive. The tag <Letter> is different from the tag <letter>. Opening and closing tags must be written with the same case: <Message>This is incorrect</message> <message>This is correct</message> "Opening and closing tags" are also called as "Start and end tags". XML Elements Must be Properly Nested In HTML, you might see improperly nested elements: <b><i>This text is bold and italic</b></i> In XML, all elements must be properly nested within each other: <b><i>This text is bold and italic</i></b> In the example above, "Properly nested" simply means that since the <i> element is opened inside the <b> element, it must be closed inside the <b> element. XML Documents Must Have a Root Element XML documents must contain one element that is the parent of all other elements. This element is called the root element. <root> <child> <subchild>.....</subchild> </child> </root>Page 6 of 11
  7. 7. XML XML Attribute Values Must be Quoted XML elements can have attributes in name/value pairs just like in HTML. In XML, the attribute values must always be quoted. In the two XML documents below, the first one is incorrect, the second is correct: <note date=12/11/2007> <to>Raja</to> <from>Jani</from> </note> <note date="12/11/2007"> <to>Raja</to> <from>Jani</from> </note> The error in the first document is that the date attribute in the note element is not quoted. Entity References Some characters have a special meaning in XML. If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. This will generate an XML error: <message>if salary < 1000 then</message> To avoid this error, replace the "<" character with an entity reference: <message>if salary &lt; 1000 then</message> There are 5 predefined entity references in XML: &lt; < less than &gt; > greater than &amp; & ampersand &apos; apostrophe &quot; " quotation mark Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it. Comments in XML The syntax for writing comments in XML is similar to that of HTML.Prof. Mukesh N. Tekwani Page 7 of 11
  8. 8. XML <!-- This is a comment --> White-space is Preserved in XML HTML truncates multiple white-space characters to one single white-space: HTML: Hello Tove Output: Hello Tove With XML, the white-space in a document is not truncated. XML Stores New Line as LF In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). In Unix applications, a new line is normally stored as a LF character. XML stores a new line as LF.9 State the XML naming rules. XML elements must follow these naming rules: • Names can contain letters, numbers, and other characters • Names cannot start with a number or punctuation character • Names cannot start with the letters xml (or XML, or Xml, etc) • Names cannot contain spaces. • Any name can be used, no words are reserved. Best Naming Practices Make names descriptive. Names with an underscore separator are nice: <first_name>, <last_name>. Names should be short and simple, like this: <book_title> not like this: <the_title_of_the_book>. Avoid "-" characters. If you name something "first-name," some software may think you want to subtract name from first. Avoid "." characters. If you name something "," some software may think that "name" is a property of the object "first." Avoid ":" characters. Colons are reserved to be used for something called namespaces (more later). XML documents often have a corresponding database. A good practice is to use the naming rules of your database for the elements in the XML documents. Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your software vendor doesnt support them.10 XML elements are extensible. Explain this statement. XML’s flexibility comes from its capability to enable you to make up your own XML Page 8 of 11
  9. 9. XML elements. This means that you can introduce tags into XML XML elements can be extended to carry more information. Look at the following XML example: <note> <to>Raja</to> <from>Jani</from> <body>Dont forget me this weekend!</body> </note> Lets imagine that we created an application that extracted the <to>, <from>, and <body> elements from the XML document to produce this output: MESSAGE To: Raja From: Jani Dont forget me this weekend! Imagine that the author of the XML document added some extra information to it: <note> <date>2008-01-10</date> <to>Raja</to> <from>Jani</from> <heading>Reminder</heading> <body>Dont forget me this weekend!</body> </note> This application will not crash because of the changes we made. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output. This is the concept of extensibility.11 Write a note on XML attributes. XML elements can have attributes. An attribute provides additional information about an element. Attributes provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but can be important to the software that wants to manipulate the element: <file type="gif">computer.gif</file> XML Attributes Must be Quoted Attribute values must always be quoted. Either single or double quotes can be used. For a persons gender, the person element can be written like this: <person gender = “female”> or <person gender = ‘female’>Prof. Mukesh N. Tekwani Page 9 of 11
  10. 10. XML XML attributes must be avoided for the following reasons: • attributes cannot contain multiple values (elements can) • attributes cannot contain tree structures (elements can) • attributes are not easily expandable (for future changes) • attributes are difficult to read and maintain. Use elements for data. Use attributes for information that is not relevant to the data.12 What is the difference between XML elements and attributes? XML does not specify about when to use elements and when to use attributes. Consider the following examples: <person gender = "female"> <firstname>Anita</firstname> <lastname>Shah</lastname> </person> <person> <gender>female</gender> <firstname>Anita</firstname> <lastname>Shah</lastname> </person> In the first example gender is an attribute. In the next example, gender is an element. Both examples provide the same information. Generally, we avoid using attributes in XML and instead prefer to use elements. Another example: Consider the following XML document : Using date attribute: <note date="10/01/2008"> <to>Raja</to> <from>Jani</from> <heading>Reminder</heading> <body>Dont forget me this weekend!</body> </note> Using date element: <note> <date>10/01/2008</date> <to>Raja</to> <from>Jani</from> <heading>Reminder</heading> <body>Dont forget me this weekend!</body> </note> We now expand the date element in the next code: <note>Page 10 of 11
  11. 11. XML <date> <day>10</day> <month>01</month> <year>2008</year> </date> <to>Raja</to> <from>Jani</from> <heading>Reminder</heading> <body>Dont forget me this weekend!</body> </note>13 What are the specifications needed for a document to be valid and well formed XML document? An XML document with correct syntax is called a “well formed XML document”. But a document validated against a DTD is a “valid” document”. Well formed document: A "Well Formed" XML document has correct XML syntax. These syntax rules are: • XML documents must have a root element • XML elements must have a closing tag • XML tags are case sensitive • XML elements must be properly nested • XML attribute values must be quoted Valid XML document: A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD): <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note SYSTEM "Note.dtd"> <note> <to>Raja</to> <from>Jani</from> <heading>Reminder</heading> <body>Dont forget me this weekend!</body> </note> The DOCTYPE declaration in the example above, is a reference to an external DTD file. The content of the file is shown in the paragraph below. The purpose of a DTD is to define the structure of an XML document. It defines the structure with a list of legal elements: <!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]>Prof. Mukesh N. Tekwani Page 11 of 11