2. MODS and XML
CTDA uses MODS version 3.5 as expressed as XML or eXtensible markup
language. XML was designed to describe data in a structured way.
XML uses “tags”, also referred to as “elements”, “labels”, “fields”. For generic
XML files, elements are not predefined and are meant to be self-descriptive.
For XML files that follow what is called a SCHEMA, a grammar and
dictionary, elements are predefined and you cannot invent your own elements.
Elements follow the requirements of the SCHEMA.
3. Why does CTDA Use MODS XML?
Fedora Commons is in part based on XML. One of its native languages is
FOXML or Fedora Object eXtensible Language. FOXML is XML according to
the Fedora Commons or Fedora Object SCHEMA.
Being in part an XML environment, we needed a descriptive standard that
could work well in XML. We also needed a descriptive standard that was
flexible, general enough to be used by any participant, and granular enough to
fit specific needs of participants. MODS or Metadata Object Descriptive
Schema 3.5 was selected. Any MODS XML file should be well formed and
valid according to the MOD Schema version 3.5.
4. Well Formedness
XML files are well formed when:
• Have a root element
• All elements
• Have a closing tag
• Are case sensitive
• Properly nested
• All attributes values are quoted
• All entity references (&, <, “, etc.) use the 5 predefined entity references (&, etc.)
5. Validity
When speaking about validity, it means that a particular SCHEMA is being
referenced. For example, you check a German dictionary to look up a word in
German not a French dictionary. The same is true for metadata standards. You
look up the specific SCHEMA for the version of metadata standard you are
using. CTDA uses MODS version 3.5.
CTDA descriptive MODS metadata records must follow the requirements of
the MODS version 3.5 SCHEMA,
http://www.loc.gov/standards/mods/v3/mods-3-5.xsd.
6. CTDA MODS Implementation Guidelines
CTDA is a collaborative effort where numerous metadata standards interact
constantly. CTDA has created an implementation guidelines to the MODS
version 3.5 to help participants understand how to create MODS 3.5 XML
files.
These guidelines are based on international standards. They also help
participants create MODS 3.5 records that can be ingested into CTDA;
Islandora only accepts single MODS records and not a collection of records.
This means that CTDA MODS 3.5 xml files have mods:mods as the root and
not mods:collection.
7. Prefixes in MODS Records
• CTDA implements the MODS prefix namespace.
A multitude of metadata standards are in play in CTDA. To help disambiguate
commonly named elements in different metadata standards, the MODS prefix
namespace helps not just us but also the programmers who recognize
immediately whether a descriptive record is solely MODS 3.5 or MODS 3.5
and FGDC or another combination.
8. Required Elements
• Title: Even in you use another metadata standard, Islandora requires a title be present.
• Resource Type: This is to help faceted searching as well as meet requirements of DPLA.
• Held By: There are many participants in the CTDA. This is required to ensure users know which
assets belong to which institutions. This is also a requirement of DPLA.
• Rights: Beyond being a requirement for DPLA, it is just good practice to let users know how they
can use and share your content in CTDA.
• Persistent Identifier (Handle): The citable link that relies on the Handle system is created
automatically for you. No matter what URL CTDA uses, the handle will always be the same.
• Language of MODS record: The DLF/Aquifer Implementation guidelines state that this should
be required. If ever we have records in multiple languages, which could happen, having this
information makes it easier for systems to work with the data. As this is good practice for MODS
records that can be shared, this is strongly recommended.
9. Required/Recommended Information
According to the MODS 3.5 SCHEMA
The MODS 3.5 SCHEMA has several recommended and required information that is written
into the SCHEMA. For required information, if this is not included this will result in an invalid
MODS 3.5 file or a MODS 3.5 file that doesn’t follow the MODS 3.5 SCHEMA requirements.
For example:
• MODS name element has the attribute called type which require the following values:
personal, family, corporate, conference
• MODS note element has the attribute called type where LC recommends a number of
values
Where do you find this information? You can consult the CTDA Implementation guidelines or
the Library of Congress’ website for the MODS standard. Be sure to look at MODS 3.5.
10. What does this look like?
<?xml version="1.0" encoding="UTF-8"?>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3" xmlns="http://www.loc.gov/mods/v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" version="3.5"
xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-5.xsd">
<mods:titleInfo>
<mods:title>Circus poster</mods:title>
</mods:titleInfo>
<mods:typeOfResource>still image</mods:typeOfResource>
<mods:note type="ownership">The Bridgeport History Center</mods:note>
<mods:accessCondition type="use and reproduction">This work is licensed under a Creative Commons Attribution-
NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA.</mods:accessCondition>
<mods:recordInfo>
<mods:recordContentSource>University of Connecticut Libraries</mods:recordContentSource>
<mods:languageOfCataloging>
<mods:languageTerm type="code" authority="iso639-2b">eng</mods:languageTerm>
</mods:languageOfCataloging>
</mods:recordInfo>
<identifier type="hdl">http://hdl.handle.net/11134/20003:67</identifier>
</mods:mods>
11. What’s Up with “identifier”?
Notice that the element, identifier, doesn’t have the MODS namespace in front
of it as mods:identifier. This is the HANDLE that is automatically added by
the system. The MODS 3.5 file is still valid.
Is this a problem? Yes and we developed a contingency into our transformation
files so that identifier without the MODS namespace would be correctly
mapped to dc:identifier.
12. What About Other Requirements?
CTDA participants are free to use any content standard they want. CTDA does ask
that participants avoid any punctuation that is known to interfere with databases. Such
punctuation covers brackets, slashes and HTML tags for instance.
What if a participant doesn’t want to use MODS 3.5? Islandora can work with multiple
descriptive metadata standards. Descriptive metadata files still have to be expressed as
an XML file and adhere to the SCHEMA for the descriptive metadata file you want to
use. The only caveat is that Islandora is designed to harvest from one descriptive
metadata standard. CTDA exposes OAI DC records from normalized MODS 3.5
records. If a participant decides to use a different descriptive metadata standard, then
it might not be possible to expose that descriptive metadata to external harvesters such
as DPLA.
13. What to Think About When Creating
Descriptive Records?
• What information do you want to communicate to users about your content?
• What information is important for your institution beyond the requirements? Do
you want to include genre terms, subjects, a description?
• Do you want a date that is easily understood by computer systems and tools such as
a timeline?
• Are you thinking about linked data?
• Do you want to implement controlled vocabularies?
• Do you need to track particular types of information or digital assets?
14. Descriptive Information & The System
Descriptive information is used for searching, indexing, harvesting, display, or
browsing. Thinking about what information you put into your records means
also thinking about how you want to display this information to users. Should
facets be used and which ones (genre or resource type)? Do you need to
highlight certain information that’s important for your institution such as
OCLC numbers? Do you want your metadata harvested?
Making these decisions will allow you to create your own guidelines as to what
descriptive information needs to be added to your descriptive records either
expressed as xml or information that is entered via the XML forms or data
entry forms.
15. Islandora XML Form Builder
Islandora comes with a core module called the XML Form Builder. This is a
tool for creating/storing HTML forms that can create/manipulate XML files.
Islandora comes with default data entry forms for each content model. These
data entry forms are generic. Islandora XML Form Builder can work with any
custom standards. XML Forms can be exported and imported.
The key to XML Forms is Xpath. The XML file created by the XML Form
Builder must be well formed. If you reference a SCHEMA in the XML file,
then the XML file must also be valid against that SCHEMA.
16. Using the XML Form Builder
• All forms created should account for:
• Root element
• Properly nested elements
• Requirements of the metadata standard SCHEMA
• CRUD Rules or Create / Read / Update / Delete
The XML Form treats each field in the form as a potential action (CRUD).
XPath describes where the action (CRUD) will take place in the XML file.
17. CRUD - Create
If Read failed
AND
If the parent XML element exists
THEN, after the form is submitted (updated)
The new element/attribute is created and appended to the parent XML
element.
18. CRUD – Create Options
• Attribute
• <mods:name type=“personal”>Smith, John Q.</mods:name>
• Element
• <mods:genre>book</mods:genre>
• XML: Snippets and the use of %value%
• <mods:note type=“ownership”>%value%</mods:note>
19. CRUD - Read
Populates the form with data from an XML file. This is done before displaying
the form. It remembers what was read for later (self::).
20. CRUD - Update
If Read was successful
AND
If the element/attribute exits
THEN, after the form is submitted (updated)
The value entered in the form is updated in the element/attribute selected.
21. CRUD - Delete
If Read was successful
AND
If the form field was removed from the form,
THEN, after the form is submitted (click update)
The element/attribute selected is deleted.
22. Form Properties
• Determines the root element and
expected set of properties
• Properties:
• Root element name
• Namespace URI
• Schema name
• Namespaces
23. Elements or the Form Element Tree
• Add an element as a childe of the currently selected element
• Copy the currently selected element
• Paste the currently copied element as a child of the currently selected
element
• Delete the currently selected element
24. Elements or the Form Element Tree
When you create a new form, the 1st element is always Root. Ensure that this
conforms to the metadata standard you’re using. For mods, the type is form. Read is
the only one checked with context of document and path of //mods:mods[1]
This Form Element Tree represents a Drupal form structure and not an XML form
structure. The Tree is used to sort elements in a Drupal manner. Children within a tree
may only be appended to elements Drupal considers to be valid as parents. For
example, a textfield cannot be a child to another textfield.
25. Form Element Tree Controls
https://api.drupal.org/api/drupal/developer!topics!forms_api_reference.html/7.x
• Type (Required): Type of Drupal form element to render
• Title: Label to appear for this element.
• Description: Brief note on what element is and can include HTML
• Default value: Value to place in the Drupal form element if the
corresponding Read action doesn’t return an element
• Required: Whether or not user has to add this information to submit the
form
26. Drupal Form Elements
• fieldset
• Description: a frame that can surround form elements and be collapsed to clean up the
display
• Render as: A titled frame surrounding elements
• Typical CRUD: XXXX if the fieldset is purely cosmetic (XML created from children of
the fieldset will use the next available XML parent) or CRXX to have the fieldset create
an XML element with potential children
• Notes: Use Collapsible and Collapsed in Advanced Form Controls
27. Drupal Form Elements
• form
• Description: the root element of the form
• Render as: a Drupal form
• Typical CRUD: XRXX – the root will be created by virtue of the form’s properties and
only Read actions are generally required
• Notes: The only element with the form type should be the root element
28. Drupal Form Elements
• markup
• Description: element that exists as part of the form definition only, and is other wise not
rendered
• Render as: nothing – children appear to be rendered in Drupal as siblings of the markup
element
• Typical CRUD: CRXX – to have the markup create an XML element with potential children
/ CRUX to have markup create an element automatically populated with content using
“xml” Create actions Type
• Notes: markup is typically given child elements and are used to help structure your element
tree in a more XML friendly way without changing the form’s look inside Drupal. They can
also be used to create elements to which both value and attribute can be appended via
Updates.
29. Drupal Form Elements
• select
• Description: element containing predefined options
• Render as: drop down menu of options
• Typical CRUD: CRUX
• Notes: Options must be set in the element’s “More Advanced Controls”. At least one
option MUST be included in the Options array.
30. Drupal Form Elements
• tabs/tabpanel
• Description: repeatable series of multiple form elements
• Render as: framed tab containing 1+ elements and a button allowing users to add more
tabs
• Typical CRUD: XXX for the tabs element, as it is the Drupal rendered part of the
tabs/tabpanel combo and CRXD for its child tabpanel so that when tabs are deleted
the XML element and children tied to the tab goes with it
• Notes: tabpanel should always be the immediate child of tabs; elements that you would
like to be repeatable can then be added as children. Set a title of the tabs to have it
rendered in Drupal
31. Drupal Form Elements
• tags/tag
• Description: repeatable series of single text fields
• Render as: text field with a green plus beside it to allow multiple values to be added.
Values can be removed by clicking red button that appears beside them
• Typical CRUD: XXXX for the tags element and CRXD for the child tag
• Notes: tag should always be the immediate child of tags and should contain no
children. Set a title of the tag element to have it rendered in Drupal
32. Drupal Form Elements
• textarea
• Description: area for entering large chunks of text
• Render as: multicolumn box for user text input
• Typical CRUD: CRUX
• Notes: Rows and Cols in “Advanced form controls” can be used to define the default
size of the rendered textarea
33. Drupal Form Elements
• textfield
• Description: area for entering text
• Render as: a single box for user text input
• Typical CRUD: CRUX
• Notes: The default size of the rendered textfield can be adjusted
34. Advanced Form Controls
• Disabled: Greys out the element and makes it inaccessible through regular
browser input.
• Prefix and Suffix: Text or markup to add before or after an element when it
is rendered in Drupal. This is often used to encapsulate a Drupal form
element within in <div> or other types of tags.
• Max Length: Allows you to limit the number of characters in a textfield
• Size: Sets the width of select and textfield Drupal form elements
35. More Advanced Form Controls
• Attributes: set of attributes to add to the form element when it is rendered in
Drupal
• Element validate: a list of function names of validators to use for this
element and/or its children. Drupal includes element_validate_integer and
element_validate_positive to confirm that an element contains an integer and
a positive integer
• User data: used by some Drupal modules to add additional information to
the element
36. Drupal Form Elements
Not all Drupal Form element types are
supported by XML forms. Generally, the
following work the best but others have been
known to work.
• form
• textfield
• textarea
• select
• hidden: used to set hidden properties or
nodes
• markup
• fieldset
• tabs and tabpanel
• tags and tag
• creative_commons: custom Islandora
XML form type that allows for the
creation of Creative Commons form
elements
37. Save, Preview, Test
Always remember to Save and Preview throughout the creation and/or editing of you
XML form.
When you click Submit, you will see the XML result in the preview frame. It is always
good to check to see if this result is valid. To do that, copy the XML result file from
the preview file and open it in oXygen. You might have to remove certain empty
elements. If you don’t have oXygen, there are free online validators.
Make sure that several people test your form. Test to ensure that the XML file created
is well formed and valid. Test to ensure that the form asks for information that people
who will enter data into the form have asked the form include.
38. Associate
For people to enter data in your form, you need to “Associate” your form. In
the list of forms, find yours and click “Associate”.
An XML form can be associated to 1+ content models. If you want a form for
each Islandora content model, then you’ll need to create an XML form for each
of the Islandora content model.
39. Associate Options
If you’ve already associated the form, you will see
those options above this window.
• Select the Islandora content model
• Select the ID such as MODS or DC
• Select the field that is the title proper
• Select an XSL and self XSL transform. XSL is
for the MODS to DC. The self XSL is to
ensure that empty nodes are removed –
necessary especially for MODS validity
• Add Association
40.
41. Even Better
Instead of starting from scratch, use an existing form. You can find these on
GitHub, https://github.com/Islandora-Labs/islandora_ingest_forms.