Presentation given by Fergus Fahey, Training Officer for Archives and Records Association (Ireland), on March 9th, 2016, in the Royal Irish Academy, Dublin. This presentation was part of a training workshop co-hosted by Digital Repository of Ireland and ARA(I), titled 'Introduction to EAD'. It introduces the concept and practice of XML in advance of a practical session at the workshop.
3. What is XML
• XML stands for EXtensible Markup Language.
• XML was designed to store and transport data.
• XML was designed to be both human- and machine-readable
• XML is a software- and hardware-independent tool for storing and transporting
data
• “XML does not DO anything”
• Very widely used to store and share data:
• By libraries to share bibliographic data
• By software applications e.g. podcast metadata,
• By banks e.g. to process Single Euro Payments Area
8. Marc record processed
000 02617cam 22004931a 450
001 1197435
005 20030227130037.0
008 940923s1840 enkabcf 00 0 eng u
035 __ |a (UPRA)CTYXRL7078-B
035 __ |9 CAF1680YL
040 __ |c UPRA |d CtY-BR
043 __ |a n-us---
090 __ |a Za W679 |b +840s
100 1_ |a Willis, Nathaniel Parker, |d 1806-1867.
245 10 |a American scenery, or, Land, lake, and river
illustrations of transatlantic nature : |b 246
246 30 |a Land, lake and river illustrations of transatlantic
nature
260 __ |a London : |b George Virtue, |c 1840
Author: Willis, Nathaniel Parker, 1806-1867.
Title: American scenery, or, Land, lake, and river illustrations
of transatlantic nature : uniform with Dr. Beattie's
Switzerland, Scotland, & Waldenses / from drawings by
W.H. Bartlett, engraved in the first style of the art,
by R. Wallis, J. Cousen, Willmore, Brandard, Adlard,
Richardson, &c ; the literary department by N.P. Willis.
American scenery
Land, lake and river illustrations of transatlantic nature
Published: London : George Virtue, 1840
Description: 30 parts : ills., map, port. ; 29 cm.
Location: BEINECKE (Non-Circulating)
Call Number: 2003 +56
Library has: pt.1-pt.30
9. Html Hyper Text Mark-up Language
• HTML was designed to display data - with focus on how data looks (Unlike
the MARC example)
• HTML – Has predefined tags:
• <b> for bold
• <p> for paragraph
• HTML tags relate to layout and appearance of text/data and images
• HTML is permissive i.e. HTML will still render if it includes invalid tags.
12. The Difference Between XML and HTML
• The XML language has no predefined tags
• The tags in the luas ticket example above (like <to> and
<price>) are not defined in any XML standard. These tags
are "invented" by the author of the XML document.
• HTML works with predefined tags like <p>, <b>, <img>,
etc.
• With XML, the author must define both the tags and the
document structure.
• XML Separates Data from Presentation
13. XML Tree root element
<eu>
element
<memberState>
element
<name>
element
<area>
element
<population>
element
<headOfstate>
element
<capital>
element
<firstName>
element
<lastName>
Text:
Brussels
Text:
Belgium
Text:
11,190,845
Text:
30,528
Text:
Philippe
Text:
Saxe-Coburg-
Gotha
element
<name>
attribute
“type”
14. XML Syntax
• XML documents must contain one root element that is the parent of all other
elements
• <root>
<child>
<subchild>.....</subchild>
</child>
</root>
16. XML Elements
• An XML element is everything from (including) the element's start tag to
(including) the element's end tag.
<population>11,190,845</population>
• An element can contain:
• text
• attributes
• other elements
• or a mix of the above
<capital>
<name>Brussels</name>
<population AdministrativeDivision="Capital Region">1,138,854</population>
</capital>
17. XML Attributes
• Attributes are designed to contain data related to a specific element.
<headOfstate type="Constitutional Monarch">
<lastName>Saxe-Coburg-Gotha</lastName>
<firstName>Philippe</firstName>
</headOfstate>
--------------------------------------------------------------------------------------
<headOfstate>
<type>Constitutional Monarch</type>
<lastName>Saxe-Coburg-Gotha</lastName>
<firstName>Philippe</firstName>
</headOfstate>
18. XML Tree root element
<eu>
element
<memberState>
element
<name>
element
<area>
element
<population>
element
<headOfstate>
element
<capital>
element
<firstName>
element
<lastName>
Text:
Brussels
Text:
Belgium
Text:
11,190,845
Text:
30,528
Text:
Philippe
Text:
Saxe-Coburg-
Gotha
element
<name>
attribute
“type”
19. XML Namespaces
• In XML, element names are defined by the developer. This often results in a
conflict when trying to mix XML documents from different XML applications.
• This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
This XML carries information about a table (a piece of furniture):
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
21. Validating XML
• XML documents must have a root
element
• XML elements must have a
closing tag
• XML tags are case sensitive
• XML elements must be properly
nested
• XML attribute values must be
quoted
<eu>
….
</eu>
<lastName>Mattarella</Lastname>
<lastName>Mattarella</lastName>
<eu>
<headOfstate type="Non executive President">
<eu>
<country>
<headOfstate type="Non executive President">
<population AdministrativeDivision=Capital Region>
<population AdministrativeDivision="Capital Region">
22. Validating xml - dtd
• An XML document with correct syntax is
called "Well Formed".
• An XML document validated against a DTD
is both "Well Formed" and "Valid“
• Xml parser only knows what is valid if you
tell it, e.g. doesn’t know that a country has a
head of state but a capital does not.
• Rules are created using a dtd file.
• <!DOCTYPE eu
• [<!ELEMENT eu (memberstate*)>
• <!ELEMENT memberstate
(name,area,population,headOfstate,capital)>
• <!ELEMENT name (#PCDATA)>
• <!ELEMENT area (#PCDATA)>
• <!ELEMENT headOfstate
(firstName,lastName)>
• <!ELEMENT capital (name,population)>
• <!ELEMENT firstName (#PCDATA)>
• <!ELEMENT lastName (#PCDATA)>
• <!ELEMENT population (#PCDATA)>
• <!ATTLIST headOfstate type CDATA "0">
• <!ATTLIST population AdministrativeDivision
CDATA "0">]>
23. Three types of error
• Badly formatted – missing closing tag, tags not matching, tags not nestled
correctly
• Not valid – doesn’t comply with dtd rules
• Information is wrong, xml will not spot this in most circumstances, may spot it if
information doesn’t comply with a rule.
• Won’t spot
<lastName>O’Higgins</ lastName >
<firstName>Michael D.</firstName>
• Might spot (if expecting alphabetic characters only):
<lastName>O’Higgins</ lastName >
<firstName>Michael D.</firstName>
24. XML and XSLT
• Xslt is one of a number of technologies which is used to process xml
• In our example we will use xslt to pick out individual xml elements and use
html to display them in a web browser.
• In my experience writing xslt is not easy, more difficult than any other
programing language I’ve used.
• Good news you don’t necessarily have to use xslt to use xml or EAD.