6. 6
XML standard for encoding finding aids
I. Basics - What is EAD?
XML (eXtensible Markup Language):
a set of rules for structuring data via markup
7. 7
XML standard for encoding finding aids
I. Basics - What is EAD?
Tag:
<unitdate era=“ce”>2011</unitdate>
Attribute:
<unitdate era=“ce”>2011</unitdate>
Element:
<unitdate era=“ce”>2011</unitdate>
8. Elements and attributes defined by a
Document Type Definition (DTD) or a
Schema
<bioghist> <bionote>
8
I. Basics - What is EAD?
XML standard for encoding finding aids
10. XML standard for encoding finding aids
Defined set of containers for descriptive data
EAD : DACS = MARC : AACR2
10
I. Basics - What is EAD?
11. XML standard for encoding finding aids
A description of records that gives the
repository physical and intellectual control over
the materials and that assists users to gain
access to and understand the materials (SAA)
Describing Archives: A Content Standard (DACS)
11
I. Basics - What is EAD?
12. What is EAD?
XML standard for encoding finding aids
I. Basics
12
13. What is EAD?
EAD encoding is not a substitute for
sound archival description!
I. Basics
13
15. EAD Finding Aid Structure
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ead SYSTEM "ead.dtd">
<?xml-stylesheet type="text/xsl" href="lbi2010.xsl"?>
II. Finding Aid
15
16. EAD Finding Aid Structure
<ead>
<eadheader>Information about repository and
finding aid</eadheader>
<archdesc>Description of archival
materials</archdesc>
</ead>
II. Finding Aid
16
17. Common Tags
• Structural and content tags
<eadheader>Many other tags</eadheader>
<date>July 4, 1776</date>
II. Finding Aid
17
18. Common Tags <eadheader>
• Finding aid author
<filedesc><titlestmt>
<author>Processed by Stanislav Pejša.</author>
</titlestmt></filedesc>
II. Finding Aid
18
19. Common Tags <archdesc>
• Biographical information
<bioghist><p>Joseph Roth was one of the most prominent
Austrian writers of the first half of the 20th
century.</p></bioghist>
• Controlled vocabulary
<controlaccess>
<geogname encodinganalog="651$a" source="lcsh"
authfilenumber="n 79040121">Austria</geogname>
</controlaccess>
II. Finding Aid
19
20. Common Tags <archdesc>
• Description of Subordinate Components
<dsc>
<c01 level="series">
<c02>Folder 1
<c03>Item 1</c03>
<c03>Item 2</c03>
</c02>
<c02>Folder 2</c02>
</c01>
II. Finding Aid
20
21. Common Tags <archdesc>
• Description of Subordinate Components
A Component <c> provides information about the content,
context, and extent of a subordinate body of materials.
Each <c> element identifies an intellectually logical section
of the described materials. The physical filing
separations between components do not always
coincide with the intellectual separations.
From EAD Tag library <http://www.loc.gov/ead/tglib/elements/c.html>
II. Finding Aid
21
22. Common Tags <archdesc>
• Description of Subordinate Components
<dsc>
<c01 level="series">
<did>
<unittitle id="serII">Series II: Addenda</unittitle>
<unitdate normal="1985/1996">1985-1996</unitdate>
</did>
<c02>Subordinate elements, such as folders</c02>
</c01>
II. Finding Aid
22
23. Common Tags <archdesc>
• Description of Subordinate Components
<c02>
<did>
<container type="box">2</container>
<container type="folder">1</container>
<unittitle>Articles</unittitle>
<unitdate>1985-1994</unitdate>
</did>
</c02>
II. Finding Aid
23
24. Common Tags <archdesc>
• Digital Archival Object (<dao>)
<c02>
<did> […]
<unittitle>Articles</unittitle>
</did>
<dao
href="http://www.archive.org/stream/josephroth_07_r
eel07#page/n218/mode/1up" actuate="onrequest"
linktype="simple" show="new"/>
</c02>
II. Finding Aid
24
25. Common Tags – Human Readable?
<dimensions>
II. Finding Aid
25
26. Common Tags – Human Readable?
<dimensions>
A subelement of <physdesc> for information
about the size of the materials being
described; usually includes numerical data.
II. Finding Aid
26
27. Common Tags – Human Readable?
<famname>
II. Finding Aid
27
28. Common Tags – Human Readable?
<famname>
The proper noun designation for a group of
persons closely related by blood or persons
who form a household. Includes single
families and family groups, e.g., Patience
Parker Family and Parker Family.
II. Finding Aid
28
29. Common Tags – Human Readable?
<revisiondesc>
II. Finding Aid
29
30. Common Tags – Human Readable?
<revisiondesc>
An optional subelement of the <eadheader>
for information about changes or
alterations that have been made to the
encoded finding aid.
II. Finding Aid
30
50. Other Uses
• Integration with other standards (e.g. EAC-CPF)
• Open Archives Initiative – Protocol for Metadata
Harvesting (OAI-PMH)
• EAD consortia
• Metadata for digitized collections
III. Implementation: Using EAD
50
52. The Future of EAD
(pre) Alpha release of EAD revision, August 2012
• Reduce semantic overload
• Simplify links
• Reduce mixed content
• Add, deprecate, and delete elements
III. Implementation: Using EAD
52
53. The Future of EAD
III. Implementation: Using EAD
53
• Revision is schema-based -- goodbye, DTD
• LC stylesheet: dtd2schema.xsl
• “Attribute validation errors indicate that the
attribute value does not conform to the ruling
ISO standard”
54. The Future of EAD
• Beta release of schema, documentation, and
migration tools, January 15, 2013
• New version of EAD released with tag library and
migration tools, July 1, 2013
slideshare.net/mikerush/ead-revision-progress-report-20120808
III. Implementation: Using EAD
54
63. Exercise How To
63
IV. Exercises
1. Make the change in the XML
2. Hit the red arrow to transform the XML to
HTML
3. Examine the HTML in the browser
64. IV. Exercises
Exercise How To - Tips
1. Be very careful with quotation marks and
angle brackets
<unitdate era="ce">2011</unitdate>
2. Copy and paste carefully - know where the
cursor is
3. O/o are not the same as 0
4. Look up while typing
66. Processing the
Joseph Roth Addendum
You are a processing
archivist at the Leo Baeck
Institute. You have been
asked to process an
addendum to the Joseph
Roth Collection, and to
update the EAD finding
aid accordingly.
IV. Exercises
66
Austrian writer Joseph Roth (1894-1939)
69. The head archivist tells you that there is an error in
the biographical information. Roth’s mother’s
first name is Maria, not Mario.
Fix this typo.
IV. Exercises
69
Exercise 2:
Biographical Information
71. Looking at the existing controlled access points,
you realize that the subject term for Roth’s
birthplace, “Brody, Galicia” is incorrect. The
proper LC term is “Brody (Ukraine)”.
Correct the term.
IV. Exercises
71
Exercise 3a:
Geographic Information
73. Add the LC authority file number for “Brody
(Ukraine)”.
IV. Exercises
73
Exercise 3b:
Geographic Information
74. Go to LC authorities: http://id.loc.gov
Search for Brody (Ukraine)
<ead><archdesc><controlaccess>
<geogname encodinganalog="651bb0$a"
role="subject" source="lcsh"
authfilenumber="n88212572">Brody
(Ukraine)</geogname>
IV. Exercises
74
Exercise 3b:
Geographic Information
76. The addendum you are given is one folder,
consisting of material in Polish from a 2002
conference about Roth.
Add this folder to Series II: Addenda, and update
the rest of the finding aid accordingly.
IV. Exercises
76
Exercise 4:
Adding a New Folder
77. The addendum you are given is one folder,
consisting of material in Polish from a 2002
conference about Roth.
Add this folder to Series II: Addenda, and update
the rest of the finding aid accordingly.
IV. Exercises
77
Exercise 4:
Adding a New Folder
"
78. What needs to be added?
Where in the finding aid?
IV. Exercises
78
Exercise 4a:
Adding the Folder
83. Find the existing language information, and see if
you can understand the format. Add Polish to
the list of languages, at both the series and the
collection levels.
IV. Exercises
83
Exercise 4c:
Updating the Language
86. Add one sentence to the Series II scope note
reflecting the additional folder.
IV. Exercises
86
Exercise 4d:
Updating the Series II Scope Note
87. <ead><archdesc><dsc><c01
level="series"><scopecontent><p>This series
consists of material that was added to the
collection after the inventory was drafted and
the bulk of the collection organized. […] Also
included are materials from a 2002 conference in
Poland.</p></scopecontent>
IV. Exercises
87
Exercise 4d:
Updating the Series II Scope Note
88. Link to the digitized version of the material in the
additional folder using this link:
http://bit.ly/x7944b
IV. Exercises
88
Exercise 5:
Adding a link to the digital object
90. The head archivist has asked you to print out
copies of your EAD finding aid for the reading
room. Create a print-friendly HTML file.
IV. Exercises
90
Exercise 6:
Creating a Print-Friendly File
91. Find a stylesheet and save it in your EAD folder.
(We’ve done this for you – thanks Syracuse!)
Change the stylesheet declaration:
<?xml-stylesheet type="text/xsl" href="eadprint-su.xsl"?>
IV. Exercises
91
Exercise 6:
Creating a Print-Friendly File
93. The head librarian has asked you to supply a MARC
record for your archival collection. Generate a
MARCXML record from this EAD.
IV. Exercises
93
Exercise 7:
Generating a MARC Record
94. Find an appropriate stylesheet.
(We’ve done this for you)
Set up a new transformation scenario.
IV. Exercises
94
Exercise 7:
Generating a MARC Record
101. The first step in migrating existing finding aids to
EAD 3.0 is to convert from DTD-based to schema-
based files. Convert a DTD-based finding aid to a
schema-based finding aid.
IV. Exercises
101
Exercise 8:
Converting from DTD to Schema
102. Find an appropriate stylesheet.
(dtd2schema_metro.xsl)
Set up and run a new transformation scenario.
(Follow the directions in exercise 7)
IV. Exercises
102
Exercise 8a:
Converting from DTD to Schema
103. IV. Exercises
103
Exercise 8b:
Converting from DTD to Schema
There is improperly coded data in the “normal”
attribute of the <date> tag (see the error
message). Fix this.
http://www.flickr.com/photos/carowallis1/2314716161/sizes/m/in/photostream/
Will be available on slideshare – many links on images and in text in the later portion of the presentation
Familiar with html? Similar (tags aka mark-up), but data structure, not display
XML (eXtensible Markup Language): set of rules for structuring data via markup
DTD and schema define the buckets; the list of tags in the tag library (we’ll see later) is defined here.
Move to schema is coming; more flexible; not something you need to know right away
http://www.flickr.com/photos/linneberg/4481309196/sizes/m/in/photostream/
http://www.flickr.com/photos/johnkay/3539126525/sizes/m/in/photostream/
Note that it is hierarchical – nested. Parent elements apply to child elements.
Encoding standards are rules for defining buckets; content standards are rules for the information inside
http://www.flickr.com/photos/linneberg/4481309196/sizes/m/in/photostream/
Xml, EAD, MARC are ways to structure your data, they are not the same as the descriptive data such as the finding aid, the catalog record, etc.
An EAD-encoded finding aid is split into info about institution/FA (metametadata) and info about materials (the finding aid)
id.loc.gov
<p> to structure text
So-called “empty element” – all the data is within the tag
Looking at the real thing
Extremely unlikely you will be asked to type it all out by hand. Temples, programs, guidance.
Software is free (like kittens, not like beer)
Designed by archivists: interface is intuitive
Manages most common archival processes
Designed for metadata standards
Output – html, ead
Built on a database (MySQL)
“ICA-AtoM is web-based archival description software that is based on International Council on Archives ('ICA') standards. 'AtoM' is an acronymn for 'Access to Memory'.”
Basic, powerful XML editor. You can safely ignore about 95% of the buttons and drop-downs, but will do things like suggest valid tags and attributes, close tags, and validate as you go. This is what we use.
Software is free (like kittens, not like beer)
Designed by archivists: interface is intuitive
Manages most common archival processes
Designed for metadata standards
Output – html, ead
Built on a database (MySQL)
http://www.loc.gov/ead/tglib/element_index.html
http://www2.archivists.org/standards
XSLT (Extensible Stylesheet Language Transformations) is a declarative, XML-based language used for the transformation of XML documents.
Here, the EAD tag processinfo is converted into HTML.
XSLT (Extensible Stylesheet Language Transformations) is a declarative, XML-based language used for the transformation of XML documents.
Here, the EAD tag processinfo is converted into HTML.
Results returned a correct level of hierarchy, linking back to full finding aid.
XSLT (Extensible Stylesheet Language Transformations) is a declarative, XML-based language used for the transformation of XML documents.
Here, the EAD tag processinfo is converted into HTML.
We’ll be logically consistent, but in real world there are more things to correct and consider.