Introduction to XML

Introduction to XML
Shannon Davis | ssdavis@wustl.edu

• Self-describing document
“Hi. I am
a book.”
What Is XML?

• Self-describing document
What Is XML?
<book> </book>

• Simplicity
• Open standard
• Extensibility
• Interoperability
• Separates content from presentation
Why Use XML?

XHTML
XML HTML
SGML
History of Mark Up Languages

XHTML
XML HTML
SGML
<H1><I>I am Born</I></H1><BR>

XHTML
XML HTML
SGML
<head type=“chapter” n=“01”>I am Born</head>

XHTML
XML HTML
SGML
<h1><i>I am Born</i></h1><br />

every XML document must declare itself
as an XML document
<?xml version="1.0"?>
<?xml version="1.0"? Encoding=“utf-8”?>
Basic Rules of XML

every XML document must have a root
element that wraps the entire
document
<TEI></TEI>
or:
<modsCollection></modsCollection>
Basic Rules of XML

every XML tag that opens must close
<div1></div1>
<head></head>
<name></name>
• The only exception to this are self-closing tags:
<pb/>
<milestone/>
<link/>
Basic Rules of XML

Basic Rules of XML
tags are case-sensitive, and tag-pairs
must match
<title></title>
not:
<title></TITLE>
or:
<Title></TITLE>

Basic Rules of XML
all tags must nest correctly
<title><persName>Dr. Strangelove</persName>,
<subtitle> or, How I learned to stop worrying and
love the bomb.</subtitle></title>
not:
<title><persName>Dr. Strangelove
</persName>,<subtitle> or, How I
learned to stop worrying and love the
bomb.</title></subtitle>

Basic Rules of XML
Well-formed XML
The following is NOT a well-formed document. Why?
<BOOK>
<TITLE>The Adventures of Huckleberry Finn
<AUTHOR>Mark Twain</TITLE></AUTHOR>
<BINDING>mass market paperback</BINDING>
<PAGES>298</PAGES>
<PRICE>$5.49</price>
</BOOK>

Review: Basic Rules of XML
• an XML document must have an XML declaration:
• every XML document must have a root element that wraps the
entire document:
• every XML tag that opens must close: the only exception to this
are self-closing tags
• tags are case-sensitive and tags must match
• all tags must nest correctly

Exercise 1
Using what you’ve learned about well formed XML,
create an XML file describing a text.
1. Open Wordpad or Notepad
2. Open springtime.txt from student_files
3. Use any tags you like to mark up the text to
create a well formed XML document.

Key Concepts of XML
XML applications
Dublin Core –broad metadata standard that supports various
purposes and business models
MathML—Math Markup Language
GedML—Genealogical Markup Language
ParlML—Parliamentary Markup Language
RETS—Real Estate Transaction Language
TEI—Text Encoding Initiative
For more examples, see:
List of XML Markup Languages.

Key Concepts of XML
Valid XML
an XML application’s tag set is enforced through
an XML schema
OR
a DTD (document type definition)

Structure of an XML document
• the prolog
• The XML declaration
• other declarations (i.e., DTD, entities)
<!DOCTYPE COLL SYSTEM “red.textclass.dtd">
<!ENTITY TEI "Text Encoding Initiative">
• the document element
• defined by root element
<TEI></TEI>

Building Blocks of XML
• elements and attributes
• general entities
• XML data

elements and attributes
<front>
CONTENTS
PAGE
<chapter>SPRINGTIME</chapter> <pageNo>1</pageNo>
SOME NAMES OF CHARACTERS IN FICTION 15
THOMAS HEARNE, 1678–1735 29
RECOLLECTIONS 51
</front>

elements and attributes
<text type=“essay”>
Governesses used to tell us that the seasons of the
year each consist of three months, and of these
<month type=“third”>March</month>, April, and May
make the springtime.</text>
<element attribute="value“>content</element>
Attribute values must always be
in single or double quotes

• an XML document must have an XML declaration
• every XML document must have a root element that wraps
the entire document:
• every XML tag that opens must close: the only exception
to this are self-closing tags
• attribute values must always be in single or double
quotation marks

Exercise 1, cont.
Using the text you marked up earlier, add attributes and
values to the elements.
Ex: BY <author type=“knight”>SIR FRANCIS
DARWIN</author>

general entities
• used as a placeholder for non-ASCII data, such as
special characters, non-Roman alphabets, and
non-text media
• to be used in the document element, entities must
be declared in prolog
(except for XML Unicode entities)

general entities
• within the document element (anywhere after the
prolog) an entity takes the standard syntax of
starting with & and ending with ;
• ampersands (&) and angle brackets (<>) are
reserved characters in XML and must be encoded
as entities
<measure type=“weight”> > 50lbs</measure>
<measure type=“weight”>> 50lbs</measure>

• an XML document must have an XML declaration
• every XML document must have a root element that wraps
the entire document:
• every XML tag that opens must close: the only exception
to this are self-closing tags
• attribute values must always be in single or double
quotation marks
• ampersands (&) and angle brackets (<>) are
reserved characters in XML and must be
encoded as entities

data
CDATA (character data)
• text data ignored by XML parser
PCDATA (parsed character data)
• text data parsed by XML parser
NDATA (notation data)
• all other media types referenced in the
XML document

Review: Key Concepts of XML
• Well-formed XML
• Follows the basic rules--no content model
• Valid XML
• an XML schema
• a DTD (document type definition)

Review: Structure of XML document
• the prolog
• The XML declaration
• other declarations (i.e., DTD, entities)
• the document element
• defined by root element, (i.e., <TEI>)

Review: Building Blocks of XML
• elements and attributes
• general entities
• XML data

WU site wide license @ http://sl.wustl.edu/catalog/index.php
•Easy-to-use and provides robust functionality for editing,
project management, and validation of structured mark-up
sources.
•Supports output to multiple target formats, including: PDF ,
TXT , HTML and XML
Software: oXygen XML Editor

• Multiplatform availability: Windows, Mac
• Multilanguage support: English, German, French, Italian,
and Japanese
• Unicode support
• Spell checking supporting English, German and French
• Easy error tracking
• Content completion
• Built in templates
oXygen Features

• Preview transformation results as XHTML or XML or in
your browser
• Import data from a database, Excel, HTML or text file
• XML project manager
• Manual and automatic validation of XML documents
against XML Schema schemas, and DTDs
• Batch validate selected files in project
oXygen Features

Introduction to XML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to XML

Similar to Introduction to XML (20)

Recently uploaded

Recently uploaded (20)

Introduction to XML

Editor's Notes