Introduction to XML Course I: Basics about XML          2003/3/28
Outline• A first look at XML• XML Syntax
What is XML?• Some important facts about XML   – XML stands for the eXtensible Markup Language   – It was developed by W3C...
Evolution of WWW    • Web was once a publishing      tool for scientific documents      only.    • Now it is a full-fledge...
Problems of HTML (I)• Over the years, HTML has been extended  – HTML has close to 100 tags  – Supporting technologies has ...
Problems of HTML (II)• Some applications would benefit greatly  from a reduction in the tag count!   – More and more peopl...
Basic Principles of XML• Increasing specialized applications need more  tags, while other applications want a simple  lang...
No Predefined Tags (I) • XML has no predefined tags.   – The author creates all the tags he needs       • If u need a cert...
No Predefined Tags (II)• How does the browser know what the author-  defined tag looks like?  – Style sheet• Can we compar...
Stricter Syntax• More than 50% codes in a browser are devoted  to handle errors or sloppiness on the author’s  part.   – D...
Document Structures (I)• An example  INTERNAL MEMO                                            title  From: Bh Huang  To: C...
Document Structures (II)<?xml version=“1.0”?><memo><header><from>Bh Huang </from><to>Conrad Ho</to><subject> Using User At...
Application of XML• Most popular applications of XML  – Document applications manipulate information    primarily intended...
Document Publishing (I)• XML concentrates on the structure of the  document, making it independent of the delivery  medium...
Document Publishing (II)• It is possible to edit and maintain documents in  XML and automatically publish them on differen...
Data Applications• If the structure of a document can be expressed  in XML, so as the structure of a database.• XML web si...
Near-term Applications of XML• Large web site maintenance• Exchange information between organizations• Content made availa...
Syntax of XML
<?xml version="1.0"?>                                                  <!-- Download from www.marchal.com or www.mcp.com -...
Elements• Fundamental Units of XML  – E.g. <tel>513-744-7098</tel>  – Each element is surrounded by a start tag and an    ...
Naming an Element• The names of elements must follow specific rules.   – The element name must start with letters or _   –...
Attributes• Additional information of elements   – <tel preferred=”true”>513-744-8889</tel>• An attribute is consisting of...
Special Attributes• xml:space  – Specifying the space handling style     • preserve: preserving all spaces     • default: ...
Empty Elements• Elements having no contents are called empty  elements  – <email href=“bhhuang@ms23.hinet.net” />  – <emai...
Hierarchical Structure         <?xml version="1.0"?>                               <!-- Download from www.marchal.com or w...
Hierarchical Structure of Elements (cont.)<entry>                                      Correct   •Elements containing othe...
The Root Element   • Each document should have only one root element           – All other elements must be children of th...
The XML Declaration•    The first line in an XML document is called the XML declaration      – <?xml version="1.0"?>•    A...
Comments• Comments are surrounded by “<!--” and “-->”• Since comments are read by human users only,  the XML parsers will ...
Unicode• Unicode support all languages in the world that  are still being used and mathematical or other  symbols• All cha...
Entity• Complicated XML documents are usually located  within several files• The organizing unit of XML documents is entit...
Predefined Entities•   &lt; <               Entity reference:•   &amp; &               <company> Marks & Spencer</company>...
Processing Instruction• The mechanism to insert non-XML statement  into an XML document  – Compromising the structural pro...
CDATA Sections• Enclosure with  “<![CDATA[“ and “]]>”         <? xml version=“1.0”?>                                <examp...
Common Errors•   The end tag is missing•   XML is case sensitive•   Using spaces in element names•   Quotes of the attribu...
Upcoming SlideShare
Loading in …5
×

Introduction to xml

316 views

Published on

sss

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
316
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to xml

  1. 1. Introduction to XML Course I: Basics about XML 2003/3/28
  2. 2. Outline• A first look at XML• XML Syntax
  3. 3. What is XML?• Some important facts about XML – XML stands for the eXtensible Markup Language – It was developed by W3C • World Wide Web Consortium • www.w3.org – XML 1.0 (2nd Edition) • W3C recommendation • http://www.w3.org/TR/REC-xml – XML 1.1 • Candidate recommendation
  4. 4. Evolution of WWW • Web was once a publishing tool for scientific documents only. • Now it is a full-fledged medium, like TV or print. – Furthermore, Web is an Interactive medium – Over 800 million Web pages are written with HTML
  5. 5. Problems of HTML (I)• Over the years, HTML has been extended – HTML has close to 100 tags – Supporting technologies has been introduced by vendors – Still more tags are needed! • Example – E-commerce applications need tags for prices, product references – Streaming would nee tags to control the flow of media – HTML is already on the verge of collapsing under its own weight!!
  6. 6. Problems of HTML (II)• Some applications would benefit greatly from a reduction in the tag count! – More and more people are accessing Web from PDA and smart phones • Mobile devices are not as powerful as PC • The complex Web language cannot be processed • The web tags are more than the web content itself
  7. 7. Basic Principles of XML• Increasing specialized applications need more tags, while other applications want a simple language – W3C resolve this dilemma by making two changes to HTML • No predefined tags • Stricter syntax
  8. 8. No Predefined Tags (I) • XML has no predefined tags. – The author creates all the tags he needs • If u need a certain tag, just make itHTML <table> <tr> <td>Price USD 499 </td> <td><a href=”/newsletter”><b>Pineapplesoft Link</b></a></td> </tr> </table> XML <price currency=“usd”>499.00</price> <toc xlink:href=”/newsletter”>Pineapplesoft Link</toc>
  9. 9. No Predefined Tags (II)• How does the browser know what the author- defined tag looks like? – Style sheet• Can we compare different prices?• What about the current and previous browsers?• Can we simplify Web site maintenance?
  10. 10. Stricter Syntax• More than 50% codes in a browser are devoted to handle errors or sloppiness on the author’s part. – Due to increasing using HTML editors – Browsers are growing in size and becoming slower• XML adopt a strict syntax for smaller and faster browsers <p>Welcome to our site! <img src=logo.jpg> <p>Welcome to our site! <img src=”logo.jpg”/></p>
  11. 11. Document Structures (I)• An example INTERNAL MEMO title From: Bh Huang To: Conrad Ho Regarding: Using User Attention Model in header Watermarking Have u finished the job? Can I adopt the program directly?I think it will be of great benefits by using the user attention model. body Bh
  12. 12. Document Structures (II)<?xml version=“1.0”?><memo><header><from>Bh Huang </from><to>Conrad Ho</to><subject> Using User Attention Model in Watermarking </subject></header><body><para>Have u finished the job? Can I adopt the program directly?I think it will begreat benefits in using the user attention model.</para><signature>Bh</signature></body></memo>
  13. 13. Application of XML• Most popular applications of XML – Document applications manipulate information primarily intended for human consumption – Data applications manipulate information primarily intended for software communications
  14. 14. Document Publishing (I)• XML concentrates on the structure of the document, making it independent of the delivery medium HTML PDF WML XML Document
  15. 15. Document Publishing (II)• It is possible to edit and maintain documents in XML and automatically publish them on different media – More and more publication are available online and in print – Web is changing rapidly – New markup languages are introduced for specific devices
  16. 16. Data Applications• If the structure of a document can be expressed in XML, so as the structure of a database.• XML web site can be regarded as a large database that application can tap
  17. 17. Near-term Applications of XML• Large web site maintenance• Exchange information between organizations• Content made available to different web sites• E-commerce applications where different organizations collaborate to server a customer• Scientific applications with new markup languages for formulas or specifications• E-books needs to express rights and ownerships
  18. 18. Syntax of XML
  19. 19. <?xml version="1.0"?> <!-- Download from www.marchal.com or www.mcp.com --> <address-book> An Example <entry> <name>John Doe</name> <address> <street>34 Fountain Square Plaza</street> John Doe <region>OH</region> 34 Fountain Square Plaza <postal-code>45202</postal-code> Cincinnati, OH 45202 <locality>Cincinnati</locality> US <country>US</country> </address> 513-744-8889 (preferred) <tel preferred="true">513-744-8889</tel> 513-744-7098 <tel>513-744-7098</tel> jdoe@emailaholic.com <email href="mailto:john@emailaholic.com"/> Jack Smith </entry> 513-744-3465 <entry> <name>Jack Smith</name> jsmith@emailaholic.com <tel>513-744-3465</tel> Never leave messages on his <email href="mailto:jack@emailaholic.com"/> answering machine. Email instead. <comments>Never leave messages on his answering machine. <b>Email instead.</b></comments> Plain text file </entry> </address-book>•Which one is easier to read?•Which one is easier for software to interpret? XML Document
  20. 20. Elements• Fundamental Units of XML – E.g. <tel>513-744-7098</tel> – Each element is surrounded by a start tag and an end tag, which are quite similar to HTML • Start tag is the element name contained in the “<“ and “>” pair • End tag must include an additional “/” – Both a start tag and a end tag is required for an element
  21. 21. Naming an Element• The names of elements must follow specific rules. – The element name must start with letters or _ – Other parts of an element name can consist letters, digits, -, ., or -. – Spaces are not allowed in an element name – Element names are case-sensitive <copyright-information> <123> <address> address-book <p> <first name> <ADDRESS> AddressBook <base64> <Tom&jerry> <Address> Suggested writing <decompte.client> <firstname> Illegal Case sensitivity Legal
  22. 22. Attributes• Additional information of elements – <tel preferred=”true”>513-744-8889</tel>• An attribute is consisting of its attribute name and value.• Attribute names must follow the same rules as element names• Start tag of an element can contain more than one or no attributes• Quote marks are required!! (quotes can be ‘ or “) – <confidentiality level=“I don’t know”>This document is not confidential </confidentiality>• Attributes are not parts of element names
  23. 23. Special Attributes• xml:space – Specifying the space handling style • preserve: preserving all spaces • default: neglecting repeated spaces• xml:lang – Specifying content of the element is written in which language • <p xml:lang=“en-GB”>What colour is it?</p> • <p xml:lang=“en-US”>What color is it?</p>
  24. 24. Empty Elements• Elements having no contents are called empty elements – <email href=“bhhuang@ms23.hinet.net” /> – <email href=“bhhuang@ms23.hinet.net”></email>
  25. 25. Hierarchical Structure <?xml version="1.0"?> <!-- Download from www.marchal.com or www.mcp.com --> <address-book>of Elements <entry> Containing texts <name>John Doe</name> <address> <street>34 Fountain Square Plaza</street> <region>OH</region> <postal-code>45202</postal-code> <locality>Cincinnati</locality> <country>US</country> </address> <tel preferred="true">513-744-8889</tel> <tel>513-744-7098</tel> <email href="mailto:john@emailaholic.com"/> </entry> Containing other elements <entry> <name>Jack Smith</name> <tel>513-744-3465</tel> <email href="mailto:jack@emailaholic.com"/> Containing mixture of both <comments>Never leave messages on his answering machine. <b>Email instead.</b></comments> </entry> </address-book>
  26. 26. Hierarchical Structure of Elements (cont.)<entry> Correct •Elements containing other elements <name>Jack Smith</name> are called parents <tel>513-744-3465</tel> •Elements contained in other elements <email href="mailto:jack@emailaholic.com"/> are called children <comments>Never leave messages on his answering machine. <b>Email instead.</b></comments> •Children must be fully contained </entry> within their parents<entry> <name>Jack Smith</name> <tel>513-744-3465</tel> <email href="mailto:jack@emailaholic.com"/> <comments>Never leave messages on his answering machine. <b>Email instead. </entry> </comments></b> Wrong
  27. 27. The Root Element • Each document should have only one root element – All other elements must be children of the root element<?xml version="1.0"?> Wrong <?xml version="1.0"?> Correct<entry> <address-book> <name>John Doe</name> <entry> <email href="mailto:john@emailaholic.com"/> <name>John Doe</name></entry> <email href="mailto:john@emailaholic.com"/><entry> </entry> <name>Jack Smith</name> <entry> <email href="mailto:jack@emailaholic.com"/> <name>Jack Smith</name></entry> <email href="mailto:jack@emailaholic.com"/> </entry> </address-book>
  28. 28. The XML Declaration• The first line in an XML document is called the XML declaration – <?xml version="1.0"?>• As long as a document contains the XML declaration, it means that it is a XML document• XML version is included in the XML declaration• XML declaration is now optional, but is suggested to be included too •Current version of XML is 1.0. •The second edition is only the first edition with errors corrected.
  29. 29. Comments• Comments are surrounded by “<!--” and “-->”• Since comments are read by human users only, the XML parsers will neglect them automatically. – E.g. <!-- Download from www.marchal.com or www.mcp.com -->• Comments cannot be added within an element – E.g. <name <!-- an invalid comment -->>Jack </name>
  30. 30. Unicode• Unicode support all languages in the world that are still being used and mathematical or other symbols• All characters in Unicode are represented by 16 bits – The XML file size will be 2X larger than usual text file – Solution: specifying “UTF-8” or “UTF-16” in XML declaration – E.g. <?xml version=“1.0” encoding=“ISO-9959-1” ?>
  31. 31. Entity• Complicated XML documents are usually located within several files• The organizing unit of XML documents is entity• E.g. if we defined an entity “us” with value “United States” – <country>&us;</country> – <country>United States></country>
  32. 32. Predefined Entities• &lt; < Entity reference:• &amp; & <company> Marks & Spencer</company>• &gt; ]]> <company> Marks &amp; Spencer</company>• &apos; ‘• &quot; “ Character reference: <name> Benoît Marchal</name>
  33. 33. Processing Instruction• The mechanism to insert non-XML statement into an XML document – Compromising the structural property of XML – Enclosure with “<?” and “>” – The first word is called target, to which application or device the instruction is directed • <?xml version=“1.0” encoding=“ISO-8859-1” ?> • <?xml-stylesheet href=“simple-ie5.xsl” type=“text/xsl” ?>
  34. 34. CDATA Sections• Enclosure with “<![CDATA[“ and “]]>” <? xml version=“1.0”?> <example>• XML parser will neglect all <![CDATA[ escaping symbols <?xml version=“1.0”?> <entry>• Used when entity <name> John Doe</name> references are used too </entry>]]> frequently or another XML </example> document is included
  35. 35. Common Errors• The end tag is missing• XML is case sensitive• Using spaces in element names• Quotes of the attribute value is missing

×