MetadataTheory: Learning Repositories Technologies (9th of 10)

Ninth Floor
Learning Repository
Technologies
& Case Studies
9

Structure
• Federation & Harvesting
• Operations of LORs
• Case Studies of LORs under development
– Policy and Technical Issues at the University of
Sydney Library
– JScholarship at The Johns Hopkins University

Federation & Harvesting
• Federated searches are conducted by search
engines accessing many different databases
with a the same query
• Harvesting, on the other hand, refers to the
gathering together of metadata from a
number of distributed repositories into one
portal website

Operations of LORs
• search/find – the ability to locate an
appropriate learning object. This can include
the ability to browse
• quality control – a system that ensures
learning objects meet technical, educational
and metadata requirements
• request – a learning object that has been
located in the database

Operations of LORs
• maintain - appropriate version control
• retrieve – receive an object that has been
requested
• submit – provide an object to a repository for
storage
• store – place a submitted object into a data
store with unique, registered identifiers that
allow it to be located

Operations of LORs
• gather (push/pull) – obtain metadata about
objects in other repositories for wider
searches and information via a clearing house
function
• publish – provide metadata to other
repositories

Policy and Technical Issues at the
University of Sydney Library

University of Sydney Library
• The University of Sydney Library supports
research, learning, and teaching through a
variety of initiatives and collaborative
activities with academics
• Aim to develop guidelines to support a
consistent and sustainable approach to
dealing with requests to manage materials
within the repository

Collection Descriptions
• An individual academic, research project team, or
a group of academics working within a discipline
created the collections
• Academics are aware of the facility of descriptive
metadata for categorizing and interrogating
datasets
– They adopt or modify domain standards or create rich
and often highly granular tag sets to suit project
requirements
• The collections are not large, generally in the
range of tens or hundreds of gigabytes

Collection Descriptions
• Metadata is typically held in databases including
File-maker and MySQL or spreadsheet
applications such as Microsoft Excel, with
associated data objects housed on personal
computer or departmental file systems
• Collections under discussion arise from:
– School of Geosciences,
– Sydney College of the Arts, Department of
Archaeology and
– School of Biological Sciences

Defining Metadata Management
Requirements
• Retain the granularity of the native record
• Enable export, including Open Archives
Initiative (OAI) harvesting, of records in DC
and native format
• Enable development of schema-specific
search interfaces, whether through repository
tools or integration with other services.
• Ensure service sustainability

Considering Options for Metadata
Management
• Map native metadata to existing DC elements
– Native metadata records are mapped to DC and
transferred to the repository as standard DC
records
– All approaches have advantages and
disadvantages related to:
• The loss of information from existing metadata
• The use of the metadata after their transformation and
practically the ways in which metadata can be used
from that point on

Advantages
• Low submission cost and low ongoing
maintenance cost,
• No configuration or maintenance of DSpace
index keys needed,
• Customized metadata schemas, or OAI
crosswalks,
• Records fully searchable through default DC
indexing and harvestable via default OAI

Disadvantages
• Loss of metadata granularity and inability to
recreate the original records
• Many items of metadata would not be
meaningful without contextual information
provided by their native tags
• Does not support provision of a traditional
field-based advanced search effective of the
granularity of the original records

Management
• Map native metadata to DC elements and
create new custom qualifiers for standard DC
tags
– Native metadata records are mapped to DC and
transferred to the repository as standard DC
records. The granularity of non-DC elements is
retained through mapping to customized qualifiers
of standard DC tags

Advantages
• Retains the granularity of the native records,
supporting recreation of the original metadata
records. Also retains contextual information
conveyed by the original tags
• Requires no configuration or maintenance of
DSpace index keys, customized metadata
schemas, or OAI crosswalks
• Records would be fully searchable via default
DC indexing and harvestable via default OAI

Disadvantages
• Higher submission and maintenance costs
than option 1, requiring additional and
ongoing recordkeeping and maintenance
procedures
• As DC qualifiers proliferate, management of
the central registry may pose challenges

Management
• Create a custom schema identical to the
native metadata set
– A custom schema separate to DC is implemented
within the repository. Metadata records are
transferred to the repository in their native
formats

Advantages
• Avoids the DC registry management problems
of option 2, by enabling partitioning and
separate maintenance of each custom schema
• May enable future provision of a collection-,
community-, or schema-level traditional field-
based advanced search reflective of the
granularity of the original records

Disadvantages
• Requires configuration and ongoing maintenance
of DSpace index keys, customized metadata
schemas, and OAI crosswalks
• May result in a proliferation of project-specific
schemas requiring ac-companying recordkeeping
and maintenance
• Will not assist in the management of hierarchical
metadata schemas, as DSpace does not support
these

Management
• Generate DC records as abstractions of the
native metadata records and submit the
native metadata records as digital object bit-
streams
– DC records act as bibliographic descriptions of the
native metadata records. The original records are
submitted as accompanying bit-streams

Advantages
• Relatively low submission cost and low ongoing
maintenance cost
• Requires no configuration or maintenance of
DSpace index keys, customized metadata
schemas, or OAI crosswalks
• Depending on how much of the original metadata
is mapped to standard DC, records could be
keyword searchable via default DC indexing

Advantages
• DC versions of the records would be
harvestable via default OAI
• Avoids the DC registry management problems
of option 2 and the schema proliferation
issues of option 3
• Retains the original metadata records in their
native format

Disadvantages
• Would not support future provision of a
collection-, community-, or schema-level
traditional field-based advanced search
reflective of the granularity of the original
records
– Would require indexing of the accompanying
native metadata file
• Would not readily enable harvesting of native
metadata records

Management
• Generate DC records as abstractions of the
native metadata records and submit the
native metadata records as digital object bit-
streams
– DC records act as bibliographic descriptions of the
native metadata records. The original records are
submitted as accompanying bit-streams
Selected

Metadata Mapping
• Metadata from the source databases was
mapped to DC to enable simple keyword
searching within DSpace and DC-based OAI
harvesting

Metadata Transfer
• Records were exported as CSV files, each record
comprising a row in the file.
• The author created a Python script, which wrote each
row to two files.
– One was a DC XML file and the other a native metadata file
– The script also packaged the metadata and associated data
files in a format suitable for submission to DSpace
• A selection of records were manually sampled and
compared and additional scripting ensured that all
records were correctly transferred

JScholarship at The Johns
Hopkins University

JScholarship
• JScholarship (http://jscholarship.library.jhu.edu), the
Johns Hopkins institutional repository, is the
home for research materials created by faculty
& staff from the university, the medical
institutions, and other affiliates such as the
Applied Physics Lab
• Launched in 2008

Management Structure
• This DSpace-based repository is a service
developed and operated jointly by the
Sheridan Libraries and the Welch Medical
Library
– Directors of both libraries and several key staff
members serve as the Oversight Group for
Jscholarship
• They establish high-level policies for the repository and
provides guidance to the IR manager in areas such as
content recruitment and assessment

Creating Metadata
• The Oversight Group decided to leave the
submission process and metadata creation to the
various research communities, with library staff
acting only in a training and advisory role
• Each community has created its metadata at the
time of submission, but the library is
experimenting with harvesting existing metadata
to use for batch ingestion of digitized library
collection

Policies
• Each research community establishes many of
the policies for its collections
– Including policies for both content & metadata
generation
– Allows for personalization in each community
• How has this affected metadata in two of the
communities in JScholarship?

Center for Africana Studies
• Created collections for center research, faculty
articles, and working papers
• Researchers contributing content are
decentralized – belong to many dpts
• An administrative assistant gathers research,
uploads files, and creates the metadata for
each of the Center’s collections

Center for Africana Studies
• The interdisciplinary nature of the collections
does not lend itself to using a specialized
controlled vocabulary for subject terms
• Although a wide-ranging thesaurus would
work with these materials, the Center has
opted to use keywords from the articles
themselves

Hopkins Population Center
• Faculty associates produce most of the
research in working papers, conference
proceedings, and journal articles
• Instead of having a single person perform the
submission, metadata creation, and approval,
they had students perform some of the
submission and basic metadata tasks

Hopkins Population Center
• The submissions were then checked and
enhanced by a liaison librarian from the Welch
Medical Library
• The only community to use a controlled
vocabulary for subject terms
– Already have their own thesaurus for their
POPLINE database, they decided to use those
terms in the JScholarship

Ninth Floor
Learning Repository
Technologies
Next stop:
10th Floor – Learning Repository
Business Models
9

MetadataTheory: Learning Repositories Technologies (9th of 10)

More Related Content

What's hot

Viewers also liked

Similar to MetadataTheory: Learning Repositories Technologies (9th of 10)

More from Nikos Palavitsinis, PhD

Recently uploaded

MetadataTheory: Learning Repositories Technologies (9th of 10)