The document proposes creating a digital library at Anonymous University using the Dublin Core metadata standard and Greenstone digital library software. It recommends training library staff on Dublin Core, the controlled vocabularies LCNAF and DCT, and assigning roles for the project such as project manager, digital manager, curator, and digitization staff. It also outlines plans for metadata elements, training procedures, collection assessment, and ensuring quality control of the digital library materials and records.
Why, why, why DELILA? A project to promote the open sharing of our informatio...
Sitkoski Metadata Proposal - Final
1. Keith Sitkoski – Term Project for Metadata for the Information Professional – 12/14/14
Executive Summary
In an effort to increase its usability, digital footprint, and record keeping capabilities,
Anonymous University Library is looking to create a digital library in order to house and catalog
those resources that are unique to the library and make them accessible both to its students and to
interested parties on the web.
In order to fulfill both of these functions, the library is proposing to work with the Dublin
Core metadata system in order to meet internationally accepted standards of interoperability and
accessibility through the use of the Qualified Dublin Core standards and the XML markup
language. Greenstone Digital Library Software, an international platform that uses Dublin Core
as its basis, will be used in order to construct the library itself. In conjunction with this, the
LCNAF, LCC, and, DCT classification and vocabulary schemes will be used.
Greenstone, the Library of Congress, and the Dublin Core Medadata initiative all have
strong training resources that will allow the Library to train its staff well and have robust support
options should complications arise. That training will be supervised by the project head, but
maintained and assessed, along with the effectiveness and potential growth of the library, by the
head of the library and the relevant parties to ensure that Anonymous University continues to
grow into the digital age.
Anonymous University Metadata Proposal
The creation of any digital library is an audacious act. Building a digital environment that
enables users from around the world to interact with and learn from the resources of a library or a
University cannot be seen as trivial. Many equally essential factors go into the creation of a
digital library, even a small one: the selection of metadata schemes, the format of the library
2. host, the selection of the terms that will make it searchable, the training of the staff that will
make it work, the ability to assess the progress of the library as it grows, and the willingness to
make changes.
After consideration of a wide variety of potential metadata schemes, the best option for
the upcoming digital repository project is the Qualified version of the Dublin Core Metadata
scheme. Supported by the Dublin Core Metadata Initiative (DCMI), Dublin Core was designed
as a universal form of core metadata with a broadly defined element that allows for adaptability
and interoperability. It was developed with 15 elements and refinements that can be used as
necessary to improve the precision of the metadata record. The Qualified version of this schema
adds significant depth to the basic format and allows for richer description of resources.
The schema was selected for several reasons. First, tools for working with and understanding the
schema are available on a variety of websites, including the DCMI’s and the W3C’s, and this
will allow for a training program to be developed and supplemented in an environment of rich
resources. Dublin Core is easily compatible with XML and this compatibility allows for the
existence of a variety of browser-based metadata construction and categorization tools. The use
of Dublin Core also guarantees a base level of interoperability, a scale of which is available on
the DCMI site, which would allow the repository to connect to the worldwide academic
community with relative ease. In addition, it will allow the institution to develop its own local
metadata as necessary but still map them to a cross-collection element.
As an example of the benefits of selecting Dublin Core, examine the potential of the
DCMI interoperability levels. Selecting the level of interoperability the library will strive for will
be up to the University, but our current vision of the capacity to share our unique resources
would place us at Level 2: Formal Semantic Interoperability as defined by the Dublin Core
3. Website. This level was selected because “It refers to formally stated relationships between
terms and rules for using such statements to draw automatic conclusions.” (Nilson, Baker,
Johnston, 2009). Level 1 does not have perfect mapping to Dublin Core elements and Level 3
beyond the required complexity for this project. If there is a change in priorities or an increase in
funding, the Library of Congress Subject Headings and Thesaurus for Graphic Materials could
be added as resources and the university would be able to reach the 3rd level of interoperability
and increase the Anonymous Universities digital footprint.
With the selection of Dublin Core in mind, the choice of site used to format and host the
digital library became clearer. Factoring in the need for both robust and accessible training
materials and the need to keep cost down, the decision was made to use Greenstone Digital
Library Software. Produced by the New Zealand Digital Library Project, Greenstone is an open
source platform designed around the premise of empowering digital library users and creators as
stated by its website (www.Greenstone.org).
The benefits of using Greenstone are apparent in its user friendly design and the strong
training resources on its website. The ability to access all the training materials and upload
information with only internet access is ideal for a developing program like the one that we are
proposing. In addition, the site has frequent updates and a responsive staff, as noted by the recent
release of Greenstone 3.4 (www.Greenstone.org). In addition, the site offers links to groups that
train on the platform who will be available to use as resources in the training of our own staff,
should it become necessary in the course of this project. They may also be available to assist in
the review of it once it is closer to completion.
Anonymous University is attempting to grow its digital footprint and to both offer the
resources unique to our institution and reach out to other institutions for what they bring as well.
4. For this reason, two selections of the controlled vocabularies for use in the new Digital Library
metadata scheme were made; The Library of Congress/NACO (LCNAF) and the Dublin Core
Type Vocabulary (DCT). The LoC’s attention to updates and the robust nature of the resources
surrounding it, combined with its focus on names of persons, organizations, events, places, and
titles (Library of Congress) will allow for interoperability that the second choice may not have.
The DCT was selected in order to fill out the important elements that the LoC does not focus on,
but still allow for a high level of interopability.
The reason for the selection of a controlled vocabulary, rather than a keyword or tag
search, was based on four principles highlighted by Leise, Fast, and Steckel in their Boxes and
Arrows Article: Content, Technology, Users, and Maintenance (Leise, Fast, and Steckel, 2002).
Given the University Library’s control of the resources that we will be cataloging and the
stability that brings, the terms used to search for objects within the library will not change
rapidly and institutional memory will act as a buffer to potentially drastic changes when they do
occur. The library is being created from the ground up, the technology cost will not be
prohibitive and the small nature of its beginning will allow the actual creation of the vocabulary
to be done efficiently and with minimal effort. The potential users of the collection will primarily
be students or faculty, a group that speaks a similar language, albeit with more or less
complexity. This means that the kinds of terms that will be searched for will have some level of
standardization. Maintenance will be a challenge, as the proper maintaining of any system
always is. However, if the necessary skills are developed as the initial training of the staff takes
shape, and the time is specifically put aside to review the vocabulary annually, the system should
remain as flexible as necessary.
5. To get the best use of the selected controlled vocabularies, a vocabulary taxonomy will
be set up following the Library of Congress Classification Outline (Library of Congress, 2014).
Using the hierarchical structure of the LoC, the vocabulary of the collection will be set up along
the principles of equivalence and notation to order the collection and allow it to be searched in an
efficient manner. In addition, certain elements will be given greater priority and weight in the
creation of the metadata, giving the staff guidelines on which elements to spend more time on,
given the limited nature of the library’s resources. Text and images have different requirements
for effective metadata, but in brief, the primary reference points will need to be searchable in
order for them to be found:
Text – Name, Author(s), Date, Format, Content, Related Items, Collection (if
appropriate)
Image – Name, Place of Creation, Format, Date, Associated Activity/Content, Related
Items, Collection (if appropriate)
While basic, these attributes provide the foundation upon which vocabulary will be linked to
each kind of object.
The benefit to building such a new, flexible system at Anonymous University is that it
gives the designers of that system a chance to instill strong institutional practices at the very
beginning, instead of having to restructure embedded cultures later. Moving past the triad of
context, content, and users (Miller, 2004), the designer of a system must move into a necessary
fourth focus of implementers. It will be those implementers who are responsible for building the
system that will house the university’s presence on the web and in the world beyond it.
The key to implementation will be a strong training program that will give enough
background to anyone connected to the program to allow them to step into any role required by
6. the development of the library with only minimal training. As a result, the following hierarchy of
roles and responsibilities is proposed, which will specify both project role and required level of
training. There will be, by necessity, a certain amount of overlap in the staff. The following roles
have been taken and reformatted to the university’s specific needs from the New Jersey Digital
Highway website (Agnew, 2004).
Project Role Project Responsibilities Training Level and Requirements
Project Manager A supervisory position that is
responsible for staffing,
vision, overall management of
the collection, milestones, and
planning the collection’s
extensions.
Strong understanding of all the
elements involved, especially the
developer and advanced user sections
of the Greenstone program. In depth
understanding of the library’s
selected metadata scheme.
Digital Manager A secondary position that
functions as a curator but has
the same level of training and
access as the project manager
in order to act in cases of
absence or as an outside
reviewer.
Same as the project manager.
Curator Responsible for both adding
to and reviewing objects
added to the collection as well
as their metadata.
Strong understanding of the elements
involved, focusing on the beginner
and user guides of the Greenstone
program. Strong understanding of the
library’s metadata scheme.
Digitization Staff Responsible only for adding
objects and metadata to the
collection.
Strong understanding of the elements
involved, focusing on the beginner
guide of the Greenstone Program and
the library’s chosen metadata
scheme.
The system is designed to flow upwards towards the project manager and toward final
review, with a focus on developing every member of the team involved so that less oversight is
required. Training will begin the same for all staff, with both Greenstone and memorizing and
studying the library’s metadata scheme. The training schedule is set up on the basis of the limited
availability of both staff librarians and student workers. As a result the training will be spread
over a month. Over the course of the first week there will be a series of meetings between all the
7. staff involved to ensure that the vision of the project and the structural understanding of the
university’s metadata scheme is shared. In addition there will be training seminars on the basics
of Greenstone. The second week will focus on the practice of uploading files to Greenstone and
applying metadata. The third week will address reviewing the collection and determining its
quality and strength, as well the quality and strength of the metadata.
The second stage of training and development will be the addition of the objects that are
already planned to be added to the new digital library. A week will be taken to add them,
allowing for the newness of the staff. A second week will be used for both reviewing these
added items for metadata continuity and consistency, and for using them to shape the digital
library beyond its initial framework.
Beyond the pre-determined additions to the collection, new materials that need to be
added will follow the schedule and staff attention indicated below:
- Step 1: The object is added by the Digitization Staff.
- Step 2: Each object that is added is reviewed for consistency and continuity by a curator.
- Step 3: Object is either added to the collection or is brought to the project manager for
troubleshooting.
- Step 4: Object is reviewed to validate consistency with collection post addition.
The assignation of roles within the project will be based on simple criteria. The pair of
student workers will form the digitization staff, with the staff librarians who have less time to
assist in the project filling out that team. Staff librarians who have greater interest and/or time
will be designated as curators. Of those curators, one will be given the role of digital manager
and will be trained to step into the role of project manager if necessary. The project manager
8. will act as the connection for the repository to the rest of the university staff as well as the
authority on potential classification issues.
Making sure that the repository is both effective and accurate is also of critical importance.
With this in mind, one curator a week will be assigned to review the digital library as a whole,
with the project manager doing so every two weeks. The purpose of this review will be to make
certain the staff are looking at the library from the outside as well as the inside. In attempting to
view that library as a user might, weaknesses or gaps may appear that are not immediately
visible to a staff member.
Determining the library’s effectiveness will be a long term issue. Web traffic can of
course be monitored, as can requests for objects from other libraries. However, the best measure
of effectiveness for the library will be the direct feedback from its users, including university
staff. The combination of all of these elements will form, along with a review of how well the
library has stayed within the bounds of its vision, an overall assessment of the repository itself.
The projected schedule for these assessments will be a once a month, during a staff meeting that
will take over a portion of the time allotted to training once the library has completed its initial
categorization goals. This meeting may be expanded to include the library’s website developer,
specific faculty members and/or university administrative staff. The focus of the meetings will be
to discuss the month’s feedback and the merits and feasibility of changes or additions that have
been requested.
Given that this digital library is a new project being built from the ground up, the
feedback of both its users and content creators will be critical in helping to shape the library into
a resource the university can rely on. Suggestions on re-categorizations, new software to use,
streamlining of site design, and vocabulary additions or changes will all be carefully considered.
9. In any project of this scope, there must be parameters set in place to maintain the quality
of both the records that are being created and the training and knowledge of the staff. Given the
available pool of student workers and professionals it is important to select a schema that can be
grasped without significant amounts of pre-existing expertise. The resources provided to work
with Dublin Core in concert with the LCNAF, the DCT, and the LCT on the web are robust
enough that they will be able to serve as the basis for a training curriculum for any of the staff.
In addition, these systems are continually updated and curated, allowing for Anonymous
University to stay relevant and current in its interoperability with other university collections.
This will in turn allow resources to be spent in other areas, such as more specific training
packages, software, and hardware potentially required for the repository. Maintaining these
standards will not only keep the level of metadata record quality high, but allow for the
interoperability which will grow Anonymous University’s presence on the web as well as the
overall value of our collection for scholars.
As we seek to bring Anonymous University farther into the digital age, to increase our
reach and the breadth of our intellectual contribution to the wider world, it is important that we
do so with a willingness to constantly seek improvement. Every system or standard or plan that
has been suggested in this proposal is part of a larger concept of continual improvement. From
the Library of Congress to Greenstone to Dublin Core, all of the resources we are using are
strong and responsible sources of information. They are the kinds of sources of information that
we should seek to become.
Bibliography:
Agnew, Grace. "Staffing for Digital Projects." From the Staffing section of the NJ Digital
Highway Project.
10. Dublin Core Metadata Initiative Information as retrieved from www.dublincore.org.
http://www.loc.gov/catdir/cpso/lcco/ - Library of Congress Classification Guide
Leise, Fred, Karl Fast, and Mike Steckel. (2002). Boxes and Arrows. Retrieved from
http://boxesandarrows.com/what-is-a-controlled-vocabulary/
Library of Congress Name Authority Guide as retrieved from
http://www.loc.gov/catdir/cpso/lcco/
Miller, Steven J. (2011). Metadata for Digital Collections. New York, NY: Neal-Schuman
Publishers, Inc.
Nilsen, M., Baker, T., & Johnston. (2009). Interoperability Levels for Dublin Core
Metadata. Retrieved from http://dublincore.org/documents/interoperability-levels/
www.Greenstone.org. Update on the release of Greenstone 3.06. Retrieved from
http://www.greenstone.org/blog/2014-11-13/greenstone-306-release-out-now/