This workshop is intended for Connecticut Digital Archive participants to introduce them to xml and how MODS or metadata object description schema is implemented in the CTDA.
2. XML
XML stands for eXtensible markup language. XML was designed to describe data whereas HTML
was designed to display data.
XML uses “tags”. In metadata land, these are also referred to as “labels”, “elements”, or “fields”.
These tags are not predefined but are meant to be self-descriptive.
You can invent your own tags in XML.
By itself, XML DOES NOT DO ANYTHING. XML needs a script written by someone or a piece of
software to receive, send, transform, or display it.
XML is a software and hardware independent tool for carrying information. It is not a
replacement for HTML but can be a complement to HTML.
3. Extensible
You can create and define your own tags.
<note>
<myAwesomeNote>
<thisIsMyTag>
The power of being extensible is the ability to customize your xml.
4. Markup
It’s all about the <tags>. The angle brackets are the most recognizable feature of XML. These
tags or elements are very similar to the ones in HTML.
Elements are surrounding by angle brackets. Each element has an opening and closing
designation like HTML.
5. Language
XML is a language or rather a “meta” – language. XML allows you to create and definite other
languages.
Have you ever heard of RSS feeds, XSLT, or XSD?
Languages such as XSLT and XSD are sometimes referred to as members of the XML family.
XSLT is eXtensible stylesheet transformation
XSD is eXtensible schema definition
6. XML Documents
When you create an XML file or document, you essentially are creating a text file with the
extension .xml. Because it is a text file, it can be read by any type of software or hardware. This is
why xml simplifies data sharing and transport. It also helps when you change platforms because
text can be read a large number of programs and systems.
XML documents all have the same structure, called a tree. There is a branch, limbs, and leaves.
The XML declaration declares that this is an XML document. The branch of the tree is called the
root. The limbs and leaves are called children. Another name for the root is “parent”.
7. XML Document
<?xml version=“1.0” encoding=“UTF-8”?>
<note>
<to>Homer</to>
<from>Marcy</from>
<heading>Reminder</heading>
<body>Don’t forget about the BBQ this weekend</body>
</note>
__
<root>
<child>
<subchild>….</subchild>
</child>
</root>
XML Declaration
The Root or ultimate parent element
Children elements to the parent element, note, which is
also the root
note
to from heading body
8. XML Expanded
<note> is the root. It is also the parent to 4 children.
<to>, <from>, <heading>, <body> are children to its parent, <note>, and are siblings.
A parent element does not necessarily have to be the root element in the XML file.
All elements must have a closing tag.
All elements are case sensitive.
All elements must be properly nested.
All XML documents (or files) must have a root element.
All attributes values must be quoted.
All entity references (such as &, <, “, etc.) must use the 5 pre-defined entity references.
9. More XML
XML has comments that appear in the following syntax:
<!-- Add your comments here -->
White-space is preserved in XML. Hello Homer. Hello Homer.
A new line in XML is just a line feed whereas in Windows it is a carriage return and line feed. Use
Notepad++ or Oxygen to edit your XML.
An XML document is well-formed is it conforms to the rules above.
10. What is an element?
An element is everything from the start tag to the closing tag.
An element can contain:
• Other elements
• Text
• Attributes
• Mix of the above
<bookstore>
<book category=“children”>
<title>Harry Potter</title>
<author>J.K. Rowling</author>
<year>2005</year>
</book>
<book category=“young adult”>
<title>Hunger Games, book 1</title>
<author>Suzanne Collins</author>
<year>2008</year>
</book>
</bookstore>
11. Elements
XML Naming rules:
•Elements are case sensitive.
•Element names must start with a letter or an underscore.
•Element names can’t start with the letters xml (XML, xMl, xmL, etc.)
•Element names can contain letters, digits, hyphens, underscores, and periods
•Element names cannot contain spaces
12. Attributes
Attributes provide additional information about elements. Values must be placed in quotes.
<person gender=“female”>
<book category=“young adult”>
Notice that attribute values can have spaces. Attributes can’t have multiple values, tree structures
and are there not very extensible.
<person>
<gender>female</gender>
</person>
When would you use an attribute and not an element? It depends on what you want and if you are
writing an XML document based on definitions already decided for you such as a metadata standard.
13. Name Conflicts
Because you can create your own elements, there are times when elements have the same
name but refer to very different things.
Here’s an HTML table:
Here’s a table that is a piece of furniture:
If we combine these XML documents, there will be a conflict. How do you know that <table> is
different from <table>?
14. Namespaces – The Name Authority of
XML
Name conflicts such as this are resolved by adding a prefix. The prefix is a namespace and must
be defined by using the xmlns attribute in the start tag of the root or element.
xmlns:prefix=“URI”
The URI can be fictional in some cases. In many cases, it is not and refers to what is called a
schema or document definition type. A schema, XSD, is like a dictionary and grammar for an XML
document. It outlines the syntax and semantics that an XML document needs to follow in order
to conform to that schema.
For example, an XML that is a MODS file and that references the MODS schema must conform to
the syntax and semantics required by MODS as specified by the MODS schema. If you want to
learn German, you need a German dictionary and grammar book to help you write in German.
15. Metadata Object Description Schema
MODS is an XML based bibliographic description schema developed and maintained by the
Library of Congress. It is a compromise between the simplicity of Dublin Core and the complexity
of MARC. It was developed in 2002. Currently, MODS is now in version 3.6.
The main web site for MODS: http://www.loc.gov/standards/mods/.
This site provides information about the standard, guidelines, tools, schemas (for each version of
MODS), conversions, etc.
The CTDA does not implement the full standard of MODS.
16. CTDA Implementation of MODS
CTDA’s implementation guidelines and metadata application profile can be found online on our
web site (http://ctdigitalarchive.org/resources-for-participants).
These guidelines and profile are based on the full standard and in part on the technical
infrastructure’s capabilities for managing metadata. Such capabilities include indexing,
mapping/transforming, re-using, sharing, displaying, or extracting metadata.
CTDA implements MODS version 3.5 and references that version in MODS XML records using the
XML namespace declaration, xmlns, and the prefix, mods.
17. Minimum MODS XML
XML Declaration
◦ <?xml version=“1.0” encoding=“UTF-8”>
Root
◦ <mods:mods xmlns:mods=“http://www.loc.gov/mods/v3” xmlns:xlink=“http://www.w3.org/1999/xlink”
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” version=“3.5”
xsi:schemaLocation=“http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-
5.xsd”>
Title
◦ <mods:titleInfo><mods:title>
Resource type
◦ <mods:typeOfResource>
Digital Resource
◦ <mods:physicalDescription><mods:digitalOrigin>
18. Minimum MODS XML Continued
Held By
◦ <mods:note type=“ownership”>
Rights
◦ <mods:accessCondition type=“use and reproduction”>
Persistent Identifier
◦ <mods:identifier type=“hdl”>
Language of MODS record
◦ <mods:recordInfo><mods:languageOfCataloging><mods:languageTerm type=“code” authority=“iso639-
2b”>
Remember that each opening tag needs a closing tag and there is a specific MODS tree to follow
according to the MODS specification or the schema version 3.5.
19. Example of Minimal MODS XML Document
<?xml version=“1.0” encoding=“UTF-8”>
<mods:mods xmlns:mods=“http://www.loc.gov/mods/v3” xmlns:xlink=“http://www.w3.org/1999/xlink” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” version=“3.5”
xsi:schemaLocation=“http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-5.xsd”>
<mods:titleInfo>
<mods:title>This is an example title an image</mods:title>
</mods:titleInfo>
<mods:typeOfResource>still image</mods:typeOfResource>
<mods:physicalDescription>
<mods:digitalOrigin>reformatted digital</mods:digitalOrigin>
</mods:physicalDescription>
<mods:note type=“ownership”>Bridgeport History Center, Bridgeport Public Library</mods:note>
<mods:accessCondition type=“use and reproduction”>Rights statement</mods:accessCondition>
<mods:identifier type=“hdl”>http://hdl.handle.net/11134/110002:495858</mods:identifier>
<mods:recordInfo>
<mods:languageOfCataloging>
<mods:languageTerm type=“code” authority=“iso639-2b”>eng</mods:languageTerm>
</mods:languageOfCataloging>
</mods:recordInfo>
</mods:mods>
20. MODS XML
Explained
XML declaration
+
Root (mods:mods)
mods:titleInfo
mods:title
mods:typeOfResource
(controlled vocabulary)
mods:physicalDescription
mods:digitalOrigin
(controlled vocabulary)
mods:note mods:accessCondition mods:identifier mods:recordInfo
mods:languageOfCataloging
mods:languageTerm
XML declaration
Open root
Open 1st child (titleInfo)
Open 1st grandchild (or child of parent titleInfo) (title)
Add content
Close 1st grandchild (title)
Close 1st child (titleInfo)
Open 2nd child (typeOfResource)
Add content
Close 2nd child (typeOfResource)
Open 3rd child (physicalDescription)
Open child of parent physicalDescription (digitalOrigin)
Add content using one of the required terms from schema
Close child of parent (digitalOrigin)
Close 3rd child (physicalDescription)
Open 4th child (note)
Add attribute type with suggested value based on LC
recommendations
Add content
Close 4th child
ETC.
type type
type
type
authority
Attributes go in the opening tag only.
21. Particulars of MODS
typeOfResource has a required value list: text; cartographic; notated music; sound recording-musical; sound
recording-nonmusical; sound recording; still image; moving image; three dimensional object; software;
multimedia; mixed material.
digitalOrigin has a required value list: born digital, reformatted digital, digitized microfilm, digitized other analog
languageTerm requires the attribute type with the value of code and the attribute authority set to iso639-2b
The attribute qualifier for dateIssued has a required value list: approximate, inferred, questionable.
There is an ORDER to how elements appear. For example, the element scale must appear before coodinates.
We don’t use the MODS element relatedItem.
22. Particulars of CTDA MODS - Name
When you want to include a name such as an author or contributor, the role must be specified
and the entire name goes into one namePart element. The element name requires the attribute
type that has the required values of personal, corporate, family, conference. The child of role,
roleTerm, requires the attributes authority and type with the required values of marcrelator and
text respectively.
<mods:name type=“personal”>
<mods:namePart>Smith, John, 1850-1899</mods:namePart>
<mods:role>
<mods:roleTerm authority=“marcrelator” type=“text”>Author</mods:roleTerm>
</mods:role>
</mods:name>
23. Particulars of CTDA MODS - Date
Dates are not required. If you add a date, CTDA implements the element dateIssued element and requires the w3cdtf encoding and attribute
keyDate. For date ranges, it is necessary to implement the attribute point with either the value start of end.
Single Date:
<mods:originInfo>
<mods:dateIssued encoding=“w3cdtf” keyDate=“yes”>2010</mods:dateIssued>
</mods:originInfo>
Date Range:
<mods:originInfo>
<mods:dateIssued encoding=“w3cdtf” keyDate=“yes” point=“start”>1907</mods:dateIssued>
<mods:dateIssued encoding=“w3cdtf” point=“end”>1917</mods:dateIssued>
</mods:originInfo>
Single Date with Qualifier:
<mods:originInfo>
<mods:dateIssued encoding=“w3cdtf” keyDate=“yes” qualifier=“inferred”>1908</mods:dateIssued>
</mods:originInfo>
24. Particulars of CTDA MODS - Coordinates
In CTDA you can record both a center point and a bounding box. The center point is recording in the element
<mods:coordintates>. MODS 3.5 does not have a convenient way to record a bounding box. We use the
<mods:extension> element to record bounding box information in the content standard CSGDM.
<mods:cartographics>
<mods:scale>0.4583333333333333</mods:scale>
<mods:coordinates>42.023187, -71.852071</mods:coordinates>
</mods:cartographics>
<mods:extension xmlns:fgdc="http://www.fgdc.gov/schemas/metadata/fgdc-std-001-1998.xsd">
<fgdc:metadata>
<fgdc:idinfo>
<fgdc:spdom>
<fgdc:bounding>
<fgdc:westbc>-71.852071</fgdc:westbc>
<fgdc:eastbc>-71.841559</fgdc:eastbc>
<fgdc:northbc>42.030805</fgdc:northbc>
<fgdc:southbc>42.023187</fgdc:southbc>
</fgdc:bounding>
</fgdc:spdom>
</fgdc:idinfo>
</fgdc:metadata>
</mods:extension>
25. Particulars of CTDA MODS – Aggregating
Content
There is one repository where all content is stored for long-term preservation purposes. Content
can be presented on different “channels” or sites. One way of doing this is using what are called
Aggregation Tags. These tags are 3 uppercase letters. Each tag designates a particular channel.
The index is configured to recognize these tags and then push content to where it needs to go.
CTDA has 2 tags: CHO, GEO. These tags are values that go in the element
<mods:targetAudience>. This element, targetAudience, CANNOT be used for any other type of
content or tags that are made up on the fly.
<mods:targetAudience>CHO</mods:targetAudience>
<mods:targetAudience>GEO</mods:targetAudience>
Question: What is the parent element of this element?
Question: What’s the different between <mods:targetAudience> and <targetAudience>?
26. How To Recognize Parent/Child
Relationships?
If you go to the main web site on MODS 3.5
outline
(http://www.loc.gov/standards/mods/mods-
outline-3-5.html), you will see a list of the TOP
LEVEL Elements. Top level elements are all
children of the root. Each top level element is
then described in terms of its children,
required or recommended attributes, and
other requirements.
27. Requirements of CTDA MODS
Well-Formed XML
The MODS xml document conforms to the requirements
of the XML standard.
Do you remember the requirements?
There are online tools to check this:
http://www.w3schools.com/xml/xml_validator.asp
http://xmlgrid.net/validator.html
Oxygen xml software editing tool
Valid Document
The MODS xml document conforms to the requirements
of MODS version 3.5.
http://www.loc.gov/standards/mods/v3/mods-3-5.xsd
What does this mean?
There are online tools to check this:
http://www.xmlvalidation.com/
http://www.utilities-
online.info/xsdvalidation/#.VVS9x_lVhBc (requires to input
both your xml and the MODS 3.5 xsd)
Oxygen xml software editing tool