Lisa Jeskins and Bethan Ruddock
Archives Hub
Mimas
By the end of today’s session we will have
given you an introduction to:
• what interoperability means
• what XML is, what it does and why it is important
• EAD structure and syntax
• EAD and hierarchies
• UK Archives Discovery Network (UKAD)
 the ability of two or more systems or
components to exchange information and to
use the information that has been exchanged
(IEEE Standard Computer Dictionary )
 the ability to exchange/share data
 integration of information resources presented in
different formats
 within a domain or across domains
 advantages of cross-searching
 XML facilitates interoperability
 Data exchange standards such as:
◦ Z39.50
◦ SRU
 user can easily search across and retrieve
resources from a wealth of systems
 moving beyond individual websites for
individual resources (silo approach)
 http://www.ukoln.ac.uk/interop-focus/
◦ to explore, publicise and mobilise the benefits and
practice of effective interoperability across diverse
information sectors
 Extensible Markup Language
 XML is a grammatical system for creating languages:
◦ a meta-language
 Use XML to design your own markup language,
consisting of meaningful tags that describe the data
they contain
 Create a language for describing…anything
 XML does not do anything itself. It is pure
information wrapped in XML tags
 You must use other means to send, receive or
display the data
XML XML technologies
is used by to create
Detailed
description
to view in a
browser
Summary
entry to
view in a
browser
PDF for
print
 XML is not about content, though there might be
certain restrictions on content
 XML is essentially about structure
 Creating a consistent structure via XML tagging enables
content to be easily identified (by machines) and used
flexibly
<title> Alice in Wonderland </title>
*XML allows you to define your tags*
<book>Alice in Wonderland</book>
<filmtitle>Alice in Wonderland</filmtitle>
<tag> content </tag>
 Attributes are simple name/value pairs
associated with an element
<tag attribute_name=“attribute_value”>content</tag>
<language>English</language>
<language langcode=“eng”>English</language>
<date normal=“2004”>20 Sept 2004</date>
<tag attribute_name=”attribute_value”>content</tag>
<tree>hornbeam</tree>
<tree type=”deciduous”>hornbeam</tree>
<date normal=”2004”>20 May 2004</date>
<date>20 May 2004</date>
This is an XML element
<trees>
<tree type=“deciduous”>
<species>oak</species>
<fruit>acorn</fruit>
</tree>
<tree type=“coniferous”>
<species>pine</species>
<fruit>pine cone</fruit>
</tree>
</trees>
<catalog>
<cd>
<title>OK Computer</title>
<artist type=“band”>Radiohead</artist>
<genre>pop</genre>
<year>1997</year>
</cd>
<cd>
<title>Stanley Road</title>
<artist type=“solo”>Paul Weller</artist>
<genre>pop</genre>
<year>1995</year>
</cd>
</catalog>
<title>Stanley Road</title>
<artist>Paul Weller</artist>
<type>solo</type>
<genre>pop</genre>
<year>1995</year>
Alice in Wonderland
Lewis Carroll
1 volume
hardback
Title Alice in Wonderland
Author Lewis Carroll
Extent 1 volume
Format hardback
<books>
<title>Alice in Wonderland</title>
<author>Lewis Carroll</author>
<extent>1 volume</extent>
<format>hardback</location>
</books>
 a root element is required
<catalog>
…..all your tags and content…
</catalog>
 closing tags are required
 case matters
 elements must be properly nested
<physdesc>
<extent>10 boxes</extent>
</physdesc>
<physdesc>
<extent>10 boxes</physdesc>
</extent>
 attribute values must be enclosed in quotation marks,
e.g. langcode=“fre”
 element names must obey some basic rules
◦ e.g. cannot start with numbers or punctuation characters,
cannot contain spaces
◦ e.g. <cd name> or <?name> would be incorrect
Look at the following recipe for
Chocolate Brownies – How
would use XML to mark this up?
(I’m reliably informed the recipe
works!)
 375g butter
 375g dark chocolate
 1 tablespoon vanilla extract
 6 eggs
 500g sugar
 225g plain flour
 Preheat the oven to 180°C, 350°F or gas mark 4. Grease a swiss roll tin or
oblong baking dish. Melt the chocolate and butter in a bowl over a
saucepan of hot water. Add the vanilla and set the mixture aside until it is
lukewarm.
 Whisk the eggs and sugar into the mixture. Sift in the flour and baking
powder and fold gently until the mixture is just combined. Pour into the
greased tin and bake for 20 to 30 minutes until the brownie is cooked
around the edges, but still soft in the middle.
 Cool and cut into squares.
 Makes 48 brownies
Chocolate Brownies
<recipe>
<title>Chocolate Brownies</title>
<ingredients>
<item>375g butter</item>
<item>375g dark chocolate</item>
<item>1 tablespoon vanilla extract</item>
<item>6 eggs</item>
<item>500g sugar</item>
<item>225g plain flour</item>
</ingredients>
<method>
<p>Preheat the oven to <temp>180°C, 350°F or gas mark 4</temp>.Grease a swiss roll tin or oblong
baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla
and set the mixture aside until it is lukewarm. Whisk the eggs and sugar into the mixture.</p>
<p>Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into
the greased tin and bake for <bakingtime>20 to 30 minutes</bakingtime> until the brownie is
cooked around the edges, but still soft in the middle.</p>
<p>Cool and cut into squares.</p>
</method>
<serving>Makes 48 brownies</serving>
</recipe>
Possible XML
markup for recipe
<ingredient>375 g butter</ingredient>
Or
<ingredient>
<item>375 g butter</item>
</ingredient>
Or
<ingredient>
<type>butter</type>
<quantity>375 g</quantity>
</ingredient>
http://www.archiveshub.ac.uk/temp/recipe.xml
 Valid XML: rules specify elements and attributes
used and how used
 Valid XML provides consistency and facilitates the
exchange of data
 Valid XML is important for displaying, processing and
exchanging XML in a wider environment
 A Document Type Definition or Schema defines the
building blocks of an XML document
 It specifies elements and attributes and defines how
they can be used
 People can agree to use a common DTD/Schema for
interchanging data
<?xml version="1.0" encoding="UTF-16"?>
<!ELEMENT recipe (title, intro?, ingredients+, method, serving*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT intro (#PCDATA)>
<!ELEMENT ingredients (item+)>
<!ELEMENT item (#PCDATA)>
<!ELEMENT method (p+)>
<!ELEMENT p (#PCDATA | temp | bakingtime)*>
<!ELEMENT temp (#PCDATA)>
<!ELEMENT bakingtime (#PCDATA)>
<!ELEMENT serving (#PCDATA)>
 Schemas perform the same task as DTDs
 Schemas use XML syntax
 Schemas support complex data types
 Easier to describe allowable content
 One XML document can point to more than one
schema
<?xml version="1.0"?>
<note
xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3schools.com note.xsd">
<note>
<to>Rachel</to>
<from>John</from>
<heading>Reminder</heading>
<body>Don't forget the concert!</body>
</note>
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com"
xmlns="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XML file DTD or Schema Valid XML
Blue
Elephant
Papers
……………………
…………
Blue
Elephant
Papers Browse
List
 Use XML technologies – for displaying, retrieving,
transforming, manipulating
 XSLT – Extensible Stylesheet Language for
Transformations
 Many technologies available to manipulate XML
documents
 transformation involves the reading in of an XML file
and an XSLT file to a processor, which can then
generate some output – typically HTML
XSLT
XML
processor
HTML output
 HTML is ONLY for display, typically in a Web browser
 HTML tags do not describe the content
 HTML cannot easily be extracted by machines for
different purposes
 XML tags can be specified by anyone; HTML tags are
prescribed
HTML: <h1> Papers of Peter Rowe </h1>
XML: <title> Papers of Peter Rowe </title>
HTML: <b> 21 May 2004 </b>
XML: <date> 21 May 2004 </date>
 International standard, supported by the W3C
 It is open, licence free and platform neutral
 It is human and machine readable
 XML documents are text documents
 XML does not determine the presentation of
the data
◦ use stylesheets to present XML data
◦ with proprietary systems content is inextricably bound up
with format
 Hierarchical structure – good for archive
descriptions!
 XML is the main basis for defining data
exchange languages
 Meaningful tags facilitate extraction – data
can be manipulated as required
 All publicly funded bodies should use XML for
data exchange (e-GIF)
 XML has been widely adopted commercially
as well as in the public sector
 XML is:
◦ simple
◦ flexible
◦ great for data exchange
 XML must be:
◦ well-formed
◦ valid
 DTDs and Schemas:
◦ to create valid XML
◦ provide tags, attributes and rules
 XML requires other XML technologies
◦ e.g. stylesheets can transform XML for display
 EAD = Encoded Archival Description
 EAD is XML for finding aids
 A data structure standard – not a content standard
 A structure that allows finding aids to be indexed,
searched, retrieved and navigated
 Compatible with ISAD(G)
EAD is:
 Flexible enough to deal with all types of finding aids:
single or multi-level, long or short, lists or calendars
etc.
 Used to create new finding aids as well as converting
old ones to standardised form
 Used to share data between systems
 EAD is maintained and developed by an
international working group
 Develops and publishes documentation and
tools: tag library, guidelines, EAD Cookbook,
websites
<ead>
<eadheader>
</eadheader>
<archdesc>
<did></did>
</archdesc>
</ead>
<ead> EAD root element
<eadheader> EAD file information wrapper
</eadheader>
<archdesc> Finding aid wrapper
<did></did> Core collection information wrapper
</archdesc>
</ead>
<archdesc>
<eadheader>
<did>
sub-fonds descriptions
<eadheader>
<eadid>
<filedesc>
<titlestmt>
<titleproper>
<profiledesc>
<revisiondesc>
EAD file information
Identifier
Title
Creation
Revision
Within <archdesc> there are elements for:
 Description
 Presentation
 Hierarchy
<archdesc>
<did>
<scopecontent>
<bioghist>
<arrangement>
<controlaccess>
Archival description
Descriptive information
Scope and Content
Biographical/Admin. History
Arrangement
Access points
<did>
<unitid>
<unititle>
<unitdate>
<origination>
<repository>
<physdesc>
<extent>
<genreform>
<physfacet>
<physloc>
<container>
<abstract>
</did>
Descriptive information
Reference
Title
Covering dates
Creator(s)
Repository
Physical description
Extent
Form
Physical Facet
Location
Container type
Brief description
<archdesc level="fonds">
<did>
<unitid>GB 0001 Foster</unitid>
<unittitle>Papers of Dr Foster</unittitle>
<unitdate normal = "1820-1833">1820-1833</unitdate>
<repository>University of Gloucestershire</repository>
<physdesc>
<extent>1 box</extent>
<physfacet>Four folders of letters, 230 folios</physfacet>
</physdesc>
<langmaterial><language langcode=“eng”>English<language>
</langmaterial>
<origination>Dr Foster</origination>
</did>
<acqinfo>
<custodhist>
<appraisal>
<processinfo>
<accruals>
<altformavail>
<accessresrict>
<userestrict>
<prefercite>
Acquisition information
Custodial history
Appraisal and selection
Process Information
Accruals information
Copies
Access restrictions
User restrictions
Citation information
<bibliography>
<fileplan>
<otherfindaid>
<relatedmaterial>
<separatedmaterial>
<index>
Publication note
Classification scheme
Other finding aids
Related material
Separated material
Keywords
<controlaccess>
<name>
<corpname>
<persname>
<famname>
<geogname>
<occupation>
<function>
<genreform>
<subject>
Controlled access headings
Names (general)
Corporate body name
Personal name
Family name
Place name
Occupations
Functions (administrative)
Genre and Form
Subject
<head>
<p>; <lb>
<emph>; <blockquote>
<list><item>;
<chronlist><chronitem>;
<ref>; <ptr>; <dao>
Headings
Layout
Italics and quotes
Lists
References, pointers
and links to digital objects
<head>
<p>; <lb>
<emph>; <blockquote>
<list><item>;
<chronlist><chronitem>;
<ref>; <ptr>; <dao>
Headings
Layout
Italics and quotes
Lists
References, pointers
and links to digital objects
NB: EAD is NOT about the presentation
of your finding aids, but about their
syntax. Separate software will take care
of the display of the information.
ISAD(G) (v.2)
3.1.1 Reference code(s)
3.1.2 Title
3.1.3 Dates of creation
3.1.4 Level of description
3.1.5 Extent of the unit
3.2.1 Name of creator
3.2.2 Administrative/Biographical
history
3.2.3 Custodial history
3.2.4 Immediate source of acquisition
3.3.1 Scope and content
3.3.2 Appraisal, destruction and
scheduling
EAD 2002
<unitid> countrycode and
repositorycode attributes
<unittitle>
<unitdate>
<archdesc> and <c> level attribute
<physdesc>, <extent>
<origination>
<bioghist>
<custodhist>
<acqinfo>
<scopecontent>
<appraisal>
3.3.3 Accruals
3.3.4 System of arrangement
3.4.1 Access conditions
3.4.2 Copyright/Reproduction
3.4.3 Language of material
3.4.4 Physical characteristics
3.4.5 Finding aids
3.5.1 Location of originals
3.5.2 Existence of copies
3.5.3 Related units of description
3.5.4 Publication note
3.6.1 Note
<accruals>
<arrangement>
<accessrestrict>
<userestrict>
<langmaterial>
<phystech>
<otherfindaid>
<originalsloc>
<altformavail>
<relatedmaterial> and <separatedmaterial>
<bibliography>
<odd>
 EAD version 1 DTD
 EAD 2002 DTD
 EAD 2002 Schema
 Available from http://www.loc.gov/ead/
 Human-readable version: EAD Tag Library (Society of
American Archivists)
 Library of Congress Official EAD site:
http://www.loc.gov/ead/
 Tag Library: http://www.loc.gov/ead/tglib/index.html
 EAD Roundtable Help Pages:
http://www.archivists.org/saagroups/ead/
ISAD(G) states that to be a conformant archival
description a finding aid must:
 Be hierarchical
◦ Description from the general to the specific
◦ Information relevant to the level of description
◦ Linking of descriptions (logical sequence)
◦ Non-repetition of information
 Contain a minimum set of data elements
 Recommended elements for lower level
descriptions:
◦ reference code
◦ title
◦ date(s)
◦ extent of the unit of description
◦ level of description
ISAD(G) levels:
 Fonds
 Sub-fonds
 Series
 Sub-series
 File
 Item
EAD levels:
<archdesc>
<dsc><c01>
<c02>
<c03>
<c04>
<c05>
<ead>…
<archdesc>
[collection level description here]
◦ <dsc>
<c01>[series] description 1
<c02>[file] description 1</c02>
<c02>[file] description 2
<c03>[item] 1</c03>
<c03>[item] 2</c03>
</c02>
</c01>
<c01>[series] description 2....
◦ </dsc>
</archdesc>
</ead>
c02 c02
c03 c03
c01
<c01 level = "subfonds">
<did>
<unitid>GB 0324 MS 54</unitid>
<unittitle>Correspondence files</unittitle>
<unitdate>1920-1945</unitdate>
<physdesc><extent>4 files</extent></physdesc>
</did>
<scopecontent>…</scopecontent>
<c02 level = "series">
<did>…</did>
<scopecontent>…</scopecontent>
</c02>
</c01>
 EAD supports two ways of representing levels
 <c> is used in A2A, <c0*> on the Hub
 Slightly easier to use <c0*>, as the numbers give you
more of an idea of the level you are working at
<dsc type="combined">
<c level="series">
<did> <unitid>Series 1</unitid>
<unittitle>Correspondence</unittitle> </did>
<scopecontent>[...]</scopecontent>
<c level="subseries">
<did> <unitid>Subseries 1.1</unitid>
<unittitle>Outgoing Correspondence</unittitle> </did>
<c level="file"> <did> <unittitle>AbbingerAldrich</unittitle> </did>
</c> </c> </c> </dsc>
 XML is a meta-language for creating mark-up
languages
 XML files require other technologies for display,
processing, etc.
 For archive finding aids EAD is the DTD/Schema to
use
 It is XML, which is an international standard
 It is a simple and effective way of structuring content
and providing meaning
 Machines can manipulate the content in all sorts of
ways
 It is a great format to store finding-aids
 Effective cross-searching requires:
◦ Interoperability
 which requires
◦ Common standards
 UKAD: http://www.ukad.org/
 To promote the opening up of data and to offer capacity for such
a cross-searching capability across the UK archive networks and
online repository catalogues
 To lead and support resource discovery through the promotion of
relevant national and international standards
 To support the development and use of name authorities
 To advocate for the reduction of cataloguing
backlogs and the retro-conversion of hard-copy
catalogues
 To promote access to digitized and digital archives
via cross-searching resource discovery systems.
 To work with other domains and potential funders to
promote archive discovery
 Fairly loose structure
 Meetings about twice a year
 Forum for discussion, sharing, connecting and collaborating
 Creating a framework for activities (matrix)
◦ International/national/regional
◦ Meeting UKAD objectives, e.g. open up data; standards-based resource
discovery; retro-conversion
 Not many UK archives currently using EAD as a storage format
 EAD will increasingly be used as an export format from
proprietary database systems like CALM, for use in XML-based
gateways such as Aim25 and the Archives Hub
 New software becoming available all the time, which makes it
easier to create, search and display XML – much of this is
open source and often free
 Differences in how EAD is used
 Encourages interoperability but still requires work to
ensure seamless cross-searching
 EAD is flexible and includes a large number of tags
which has advantages and disadvantages
 XML is an international standard for sharing
information
 EAD is the XML language for archival finding aids
 EAD is not a content standard
 Use ISAD(G) for content guidelines and thesauri or
authority files for index terms
 You have used the Archives Hub’s EAD editor to
create EAD records
 XML Editors, such as XMetal or XMLspy can provide
help with validating and with selecting tags and
attributes
 EAD will become increasingly important
Archives hub ead 2010_extended

Archives hub ead 2010_extended

  • 1.
    Lisa Jeskins andBethan Ruddock Archives Hub Mimas
  • 2.
    By the endof today’s session we will have given you an introduction to: • what interoperability means • what XML is, what it does and why it is important • EAD structure and syntax • EAD and hierarchies • UK Archives Discovery Network (UKAD)
  • 4.
     the abilityof two or more systems or components to exchange information and to use the information that has been exchanged (IEEE Standard Computer Dictionary )
  • 5.
     the abilityto exchange/share data  integration of information resources presented in different formats  within a domain or across domains  advantages of cross-searching  XML facilitates interoperability
  • 6.
     Data exchangestandards such as: ◦ Z39.50 ◦ SRU
  • 7.
     user caneasily search across and retrieve resources from a wealth of systems  moving beyond individual websites for individual resources (silo approach)
  • 8.
     http://www.ukoln.ac.uk/interop-focus/ ◦ toexplore, publicise and mobilise the benefits and practice of effective interoperability across diverse information sectors
  • 10.
     Extensible MarkupLanguage  XML is a grammatical system for creating languages: ◦ a meta-language  Use XML to design your own markup language, consisting of meaningful tags that describe the data they contain  Create a language for describing…anything
  • 11.
     XML doesnot do anything itself. It is pure information wrapped in XML tags  You must use other means to send, receive or display the data XML XML technologies is used by to create Detailed description to view in a browser Summary entry to view in a browser PDF for print
  • 12.
     XML isnot about content, though there might be certain restrictions on content  XML is essentially about structure  Creating a consistent structure via XML tagging enables content to be easily identified (by machines) and used flexibly
  • 13.
    <title> Alice inWonderland </title> *XML allows you to define your tags* <book>Alice in Wonderland</book> <filmtitle>Alice in Wonderland</filmtitle> <tag> content </tag>
  • 14.
     Attributes aresimple name/value pairs associated with an element <tag attribute_name=“attribute_value”>content</tag> <language>English</language> <language langcode=“eng”>English</language> <date normal=“2004”>20 Sept 2004</date>
  • 15.
    <tag attribute_name=”attribute_value”>content</tag> <tree>hornbeam</tree> <tree type=”deciduous”>hornbeam</tree> <datenormal=”2004”>20 May 2004</date> <date>20 May 2004</date> This is an XML element
  • 16.
  • 17.
    <catalog> <cd> <title>OK Computer</title> <artist type=“band”>Radiohead</artist> <genre>pop</genre> <year>1997</year> </cd> <cd> <title>StanleyRoad</title> <artist type=“solo”>Paul Weller</artist> <genre>pop</genre> <year>1995</year> </cd> </catalog> <title>Stanley Road</title> <artist>Paul Weller</artist> <type>solo</type> <genre>pop</genre> <year>1995</year>
  • 18.
    Alice in Wonderland LewisCarroll 1 volume hardback
  • 19.
    Title Alice inWonderland Author Lewis Carroll Extent 1 volume Format hardback
  • 20.
    <books> <title>Alice in Wonderland</title> <author>LewisCarroll</author> <extent>1 volume</extent> <format>hardback</location> </books>
  • 21.
     a rootelement is required <catalog> …..all your tags and content… </catalog>  closing tags are required  case matters
  • 22.
     elements mustbe properly nested <physdesc> <extent>10 boxes</extent> </physdesc> <physdesc> <extent>10 boxes</physdesc> </extent>
  • 23.
     attribute valuesmust be enclosed in quotation marks, e.g. langcode=“fre”  element names must obey some basic rules ◦ e.g. cannot start with numbers or punctuation characters, cannot contain spaces ◦ e.g. <cd name> or <?name> would be incorrect
  • 24.
    Look at thefollowing recipe for Chocolate Brownies – How would use XML to mark this up? (I’m reliably informed the recipe works!)
  • 25.
     375g butter 375g dark chocolate  1 tablespoon vanilla extract  6 eggs  500g sugar  225g plain flour  Preheat the oven to 180°C, 350°F or gas mark 4. Grease a swiss roll tin or oblong baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm.  Whisk the eggs and sugar into the mixture. Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for 20 to 30 minutes until the brownie is cooked around the edges, but still soft in the middle.  Cool and cut into squares.  Makes 48 brownies Chocolate Brownies
  • 26.
    <recipe> <title>Chocolate Brownies</title> <ingredients> <item>375g butter</item> <item>375gdark chocolate</item> <item>1 tablespoon vanilla extract</item> <item>6 eggs</item> <item>500g sugar</item> <item>225g plain flour</item> </ingredients> <method> <p>Preheat the oven to <temp>180°C, 350°F or gas mark 4</temp>.Grease a swiss roll tin or oblong baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm. Whisk the eggs and sugar into the mixture.</p> <p>Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for <bakingtime>20 to 30 minutes</bakingtime> until the brownie is cooked around the edges, but still soft in the middle.</p> <p>Cool and cut into squares.</p> </method> <serving>Makes 48 brownies</serving> </recipe> Possible XML markup for recipe
  • 27.
    <ingredient>375 g butter</ingredient> Or <ingredient> <item>375g butter</item> </ingredient> Or <ingredient> <type>butter</type> <quantity>375 g</quantity> </ingredient>
  • 28.
  • 29.
     Valid XML:rules specify elements and attributes used and how used  Valid XML provides consistency and facilitates the exchange of data  Valid XML is important for displaying, processing and exchanging XML in a wider environment
  • 30.
     A DocumentType Definition or Schema defines the building blocks of an XML document  It specifies elements and attributes and defines how they can be used  People can agree to use a common DTD/Schema for interchanging data
  • 31.
    <?xml version="1.0" encoding="UTF-16"?> <!ELEMENTrecipe (title, intro?, ingredients+, method, serving*)> <!ELEMENT title (#PCDATA)> <!ELEMENT intro (#PCDATA)> <!ELEMENT ingredients (item+)> <!ELEMENT item (#PCDATA)> <!ELEMENT method (p+)> <!ELEMENT p (#PCDATA | temp | bakingtime)*> <!ELEMENT temp (#PCDATA)> <!ELEMENT bakingtime (#PCDATA)> <!ELEMENT serving (#PCDATA)>
  • 32.
     Schemas performthe same task as DTDs  Schemas use XML syntax  Schemas support complex data types  Easier to describe allowable content  One XML document can point to more than one schema
  • 33.
  • 34.
    <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com"elementFormDefault="qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
  • 35.
    XML file DTDor Schema Valid XML Blue Elephant Papers …………………… ………… Blue Elephant Papers Browse List
  • 38.
     Use XMLtechnologies – for displaying, retrieving, transforming, manipulating  XSLT – Extensible Stylesheet Language for Transformations  Many technologies available to manipulate XML documents
  • 39.
     transformation involvesthe reading in of an XML file and an XSLT file to a processor, which can then generate some output – typically HTML XSLT XML processor HTML output
  • 40.
     HTML isONLY for display, typically in a Web browser  HTML tags do not describe the content  HTML cannot easily be extracted by machines for different purposes  XML tags can be specified by anyone; HTML tags are prescribed
  • 41.
    HTML: <h1> Papersof Peter Rowe </h1> XML: <title> Papers of Peter Rowe </title> HTML: <b> 21 May 2004 </b> XML: <date> 21 May 2004 </date>
  • 42.
     International standard,supported by the W3C  It is open, licence free and platform neutral  It is human and machine readable  XML documents are text documents
  • 43.
     XML doesnot determine the presentation of the data ◦ use stylesheets to present XML data ◦ with proprietary systems content is inextricably bound up with format  Hierarchical structure – good for archive descriptions!
  • 44.
     XML isthe main basis for defining data exchange languages  Meaningful tags facilitate extraction – data can be manipulated as required
  • 45.
     All publiclyfunded bodies should use XML for data exchange (e-GIF)  XML has been widely adopted commercially as well as in the public sector
  • 46.
     XML is: ◦simple ◦ flexible ◦ great for data exchange  XML must be: ◦ well-formed ◦ valid  DTDs and Schemas: ◦ to create valid XML ◦ provide tags, attributes and rules  XML requires other XML technologies ◦ e.g. stylesheets can transform XML for display
  • 48.
     EAD =Encoded Archival Description  EAD is XML for finding aids  A data structure standard – not a content standard  A structure that allows finding aids to be indexed, searched, retrieved and navigated  Compatible with ISAD(G)
  • 49.
    EAD is:  Flexibleenough to deal with all types of finding aids: single or multi-level, long or short, lists or calendars etc.  Used to create new finding aids as well as converting old ones to standardised form  Used to share data between systems
  • 50.
     EAD ismaintained and developed by an international working group  Develops and publishes documentation and tools: tag library, guidelines, EAD Cookbook, websites
  • 52.
  • 53.
    <ead> EAD rootelement <eadheader> EAD file information wrapper </eadheader> <archdesc> Finding aid wrapper <did></did> Core collection information wrapper </archdesc> </ead>
  • 54.
  • 55.
  • 56.
    Within <archdesc> thereare elements for:  Description  Presentation  Hierarchy
  • 57.
  • 58.
  • 59.
    <archdesc level="fonds"> <did> <unitid>GB 0001Foster</unitid> <unittitle>Papers of Dr Foster</unittitle> <unitdate normal = "1820-1833">1820-1833</unitdate> <repository>University of Gloucestershire</repository> <physdesc> <extent>1 box</extent> <physfacet>Four folders of letters, 230 folios</physfacet> </physdesc> <langmaterial><language langcode=“eng”>English<language> </langmaterial> <origination>Dr Foster</origination> </did>
  • 60.
    <acqinfo> <custodhist> <appraisal> <processinfo> <accruals> <altformavail> <accessresrict> <userestrict> <prefercite> Acquisition information Custodial history Appraisaland selection Process Information Accruals information Copies Access restrictions User restrictions Citation information
  • 61.
  • 62.
    <controlaccess> <name> <corpname> <persname> <famname> <geogname> <occupation> <function> <genreform> <subject> Controlled access headings Names(general) Corporate body name Personal name Family name Place name Occupations Functions (administrative) Genre and Form Subject
  • 63.
    <head> <p>; <lb> <emph>; <blockquote> <list><item>; <chronlist><chronitem>; <ref>;<ptr>; <dao> Headings Layout Italics and quotes Lists References, pointers and links to digital objects
  • 64.
    <head> <p>; <lb> <emph>; <blockquote> <list><item>; <chronlist><chronitem>; <ref>;<ptr>; <dao> Headings Layout Italics and quotes Lists References, pointers and links to digital objects NB: EAD is NOT about the presentation of your finding aids, but about their syntax. Separate software will take care of the display of the information.
  • 65.
    ISAD(G) (v.2) 3.1.1 Referencecode(s) 3.1.2 Title 3.1.3 Dates of creation 3.1.4 Level of description 3.1.5 Extent of the unit 3.2.1 Name of creator 3.2.2 Administrative/Biographical history 3.2.3 Custodial history 3.2.4 Immediate source of acquisition 3.3.1 Scope and content 3.3.2 Appraisal, destruction and scheduling EAD 2002 <unitid> countrycode and repositorycode attributes <unittitle> <unitdate> <archdesc> and <c> level attribute <physdesc>, <extent> <origination> <bioghist> <custodhist> <acqinfo> <scopecontent> <appraisal>
  • 66.
    3.3.3 Accruals 3.3.4 Systemof arrangement 3.4.1 Access conditions 3.4.2 Copyright/Reproduction 3.4.3 Language of material 3.4.4 Physical characteristics 3.4.5 Finding aids 3.5.1 Location of originals 3.5.2 Existence of copies 3.5.3 Related units of description 3.5.4 Publication note 3.6.1 Note <accruals> <arrangement> <accessrestrict> <userestrict> <langmaterial> <phystech> <otherfindaid> <originalsloc> <altformavail> <relatedmaterial> and <separatedmaterial> <bibliography> <odd>
  • 67.
     EAD version1 DTD  EAD 2002 DTD  EAD 2002 Schema  Available from http://www.loc.gov/ead/  Human-readable version: EAD Tag Library (Society of American Archivists)
  • 68.
     Library ofCongress Official EAD site: http://www.loc.gov/ead/  Tag Library: http://www.loc.gov/ead/tglib/index.html  EAD Roundtable Help Pages: http://www.archivists.org/saagroups/ead/
  • 70.
    ISAD(G) states thatto be a conformant archival description a finding aid must:  Be hierarchical ◦ Description from the general to the specific ◦ Information relevant to the level of description ◦ Linking of descriptions (logical sequence) ◦ Non-repetition of information  Contain a minimum set of data elements
  • 71.
     Recommended elementsfor lower level descriptions: ◦ reference code ◦ title ◦ date(s) ◦ extent of the unit of description ◦ level of description
  • 72.
    ISAD(G) levels:  Fonds Sub-fonds  Series  Sub-series  File  Item EAD levels: <archdesc> <dsc><c01> <c02> <c03> <c04> <c05>
  • 73.
    <ead>… <archdesc> [collection level descriptionhere] ◦ <dsc> <c01>[series] description 1 <c02>[file] description 1</c02> <c02>[file] description 2 <c03>[item] 1</c03> <c03>[item] 2</c03> </c02> </c01> <c01>[series] description 2.... ◦ </dsc> </archdesc> </ead> c02 c02 c03 c03 c01
  • 74.
    <c01 level ="subfonds"> <did> <unitid>GB 0324 MS 54</unitid> <unittitle>Correspondence files</unittitle> <unitdate>1920-1945</unitdate> <physdesc><extent>4 files</extent></physdesc> </did> <scopecontent>…</scopecontent> <c02 level = "series"> <did>…</did> <scopecontent>…</scopecontent> </c02> </c01>
  • 75.
     EAD supportstwo ways of representing levels  <c> is used in A2A, <c0*> on the Hub  Slightly easier to use <c0*>, as the numbers give you more of an idea of the level you are working at
  • 76.
    <dsc type="combined"> <c level="series"> <did><unitid>Series 1</unitid> <unittitle>Correspondence</unittitle> </did> <scopecontent>[...]</scopecontent> <c level="subseries"> <did> <unitid>Subseries 1.1</unitid> <unittitle>Outgoing Correspondence</unittitle> </did> <c level="file"> <did> <unittitle>AbbingerAldrich</unittitle> </did> </c> </c> </c> </dsc>
  • 77.
     XML isa meta-language for creating mark-up languages  XML files require other technologies for display, processing, etc.  For archive finding aids EAD is the DTD/Schema to use
  • 78.
     It isXML, which is an international standard  It is a simple and effective way of structuring content and providing meaning  Machines can manipulate the content in all sorts of ways  It is a great format to store finding-aids
  • 80.
     Effective cross-searchingrequires: ◦ Interoperability  which requires ◦ Common standards
  • 82.
     UKAD: http://www.ukad.org/ To promote the opening up of data and to offer capacity for such a cross-searching capability across the UK archive networks and online repository catalogues  To lead and support resource discovery through the promotion of relevant national and international standards  To support the development and use of name authorities
  • 83.
     To advocatefor the reduction of cataloguing backlogs and the retro-conversion of hard-copy catalogues  To promote access to digitized and digital archives via cross-searching resource discovery systems.  To work with other domains and potential funders to promote archive discovery
  • 84.
     Fairly loosestructure  Meetings about twice a year  Forum for discussion, sharing, connecting and collaborating  Creating a framework for activities (matrix) ◦ International/national/regional ◦ Meeting UKAD objectives, e.g. open up data; standards-based resource discovery; retro-conversion
  • 85.
     Not manyUK archives currently using EAD as a storage format  EAD will increasingly be used as an export format from proprietary database systems like CALM, for use in XML-based gateways such as Aim25 and the Archives Hub  New software becoming available all the time, which makes it easier to create, search and display XML – much of this is open source and often free
  • 86.
     Differences inhow EAD is used  Encourages interoperability but still requires work to ensure seamless cross-searching  EAD is flexible and includes a large number of tags which has advantages and disadvantages
  • 87.
     XML isan international standard for sharing information  EAD is the XML language for archival finding aids  EAD is not a content standard  Use ISAD(G) for content guidelines and thesauri or authority files for index terms
  • 88.
     You haveused the Archives Hub’s EAD editor to create EAD records  XML Editors, such as XMetal or XMLspy can provide help with validating and with selecting tags and attributes  EAD will become increasingly important

Editor's Notes

  • #27 This is just one way that the recipe could be marked up. This would be valid XML. Notice the pairing of the tags and that this is well nested.
  • #82 Key UKAD partners: Access 2 Archives, Archives Hub, AIM25, Archives Wales, Genesis, Janus, National Register of Archives, Scottish Archives Network, A Vision of Britain