Ninth Floor
Learning Repository
Technologies
& Case Studies
9
Structure
• Federation & Harvesting
• Operations of LORs
• Case Studies of LORs under development
– Policy and Technical Issues at the University of
Sydney Library
– JScholarship at The Johns Hopkins University
Federation & Harvesting
• Federated searches are conducted by search
engines accessing many different databases
with a the same query
• Harvesting, on the other hand, refers to the
gathering together of metadata from a
number of distributed repositories into one
portal website
Operations of LORs
• search/find – the ability to locate an
appropriate learning object. This can include
the ability to browse
• quality control – a system that ensures
learning objects meet technical, educational
and metadata requirements
• request – a learning object that has been
located in the database
Operations of LORs
• maintain - appropriate version control
• retrieve – receive an object that has been
requested
• submit – provide an object to a repository for
storage
• store – place a submitted object into a data
store with unique, registered identifiers that
allow it to be located
Operations of LORs
• gather (push/pull) – obtain metadata about
objects in other repositories for wider
searches and information via a clearing house
function
• publish – provide metadata to other
repositories
Case Studies
Lessons Learned
Policy and Technical Issues at the
University of Sydney Library
University of Sydney Library
• The University of Sydney Library supports
research, learning, and teaching through a
variety of initiatives and collaborative
activities with academics
• Aim to develop guidelines to support a
consistent and sustainable approach to
dealing with requests to manage materials
within the repository
Collection Descriptions
• An individual academic, research project team, or
a group of academics working within a discipline
created the collections
• Academics are aware of the facility of descriptive
metadata for categorizing and interrogating
datasets
– They adopt or modify domain standards or create rich
and often highly granular tag sets to suit project
requirements
• The collections are not large, generally in the
range of tens or hundreds of gigabytes
Collection Descriptions
• Metadata is typically held in databases including
File-maker and MySQL or spreadsheet
applications such as Microsoft Excel, with
associated data objects housed on personal
computer or departmental file systems
• Collections under discussion arise from:
– School of Geosciences,
– Sydney College of the Arts, Department of
Archaeology and
– School of Biological Sciences
Defining Metadata Management
Requirements
• Retain the granularity of the native record
• Enable export, including Open Archives
Initiative (OAI) harvesting, of records in DC
and native format
• Enable development of schema-specific
search interfaces, whether through repository
tools or integration with other services.
• Ensure service sustainability
Considering Options for Metadata
Management
• Map native metadata to existing DC elements
– Native metadata records are mapped to DC and
transferred to the repository as standard DC
records
– All approaches have advantages and
disadvantages related to:
• The loss of information from existing metadata
• The use of the metadata after their transformation and
practically the ways in which metadata can be used
from that point on
Advantages
• Low submission cost and low ongoing
maintenance cost,
• No configuration or maintenance of DSpace
index keys needed,
• Customized metadata schemas, or OAI
crosswalks,
• Records fully searchable through default DC
indexing and harvestable via default OAI
Disadvantages
• Loss of metadata granularity and inability to
recreate the original records
• Many items of metadata would not be
meaningful without contextual information
provided by their native tags
• Does not support provision of a traditional
field-based advanced search effective of the
granularity of the original records
Considering Options for Metadata
Management
• Map native metadata to DC elements and
create new custom qualifiers for standard DC
tags
– Native metadata records are mapped to DC and
transferred to the repository as standard DC
records. The granularity of non-DC elements is
retained through mapping to customized qualifiers
of standard DC tags
Advantages
• Retains the granularity of the native records,
supporting recreation of the original metadata
records. Also retains contextual information
conveyed by the original tags
• Requires no configuration or maintenance of
DSpace index keys, customized metadata
schemas, or OAI crosswalks
• Records would be fully searchable via default
DC indexing and harvestable via default OAI
Disadvantages
• Higher submission and maintenance costs
than option 1, requiring additional and
ongoing recordkeeping and maintenance
procedures
• As DC qualifiers proliferate, management of
the central registry may pose challenges
Considering Options for Metadata
Management
• Create a custom schema identical to the
native metadata set
– A custom schema separate to DC is implemented
within the repository. Metadata records are
transferred to the repository in their native
formats
Advantages
• Avoids the DC registry management problems
of option 2, by enabling partitioning and
separate maintenance of each custom schema
• May enable future provision of a collection-,
community-, or schema-level traditional field-
based advanced search reflective of the
granularity of the original records
Disadvantages
• Requires configuration and ongoing maintenance
of DSpace index keys, customized metadata
schemas, and OAI crosswalks
• May result in a proliferation of project-specific
schemas requiring ac-companying recordkeeping
and maintenance
• Will not assist in the management of hierarchical
metadata schemas, as DSpace does not support
these
Considering Options for Metadata
Management
• Generate DC records as abstractions of the
native metadata records and submit the
native metadata records as digital object bit-
streams
– DC records act as bibliographic descriptions of the
native metadata records. The original records are
submitted as accompanying bit-streams
Advantages
• Relatively low submission cost and low ongoing
maintenance cost
• Requires no configuration or maintenance of
DSpace index keys, customized metadata
schemas, or OAI crosswalks
• Depending on how much of the original metadata
is mapped to standard DC, records could be
keyword searchable via default DC indexing
Advantages
• DC versions of the records would be
harvestable via default OAI
• Avoids the DC registry management problems
of option 2 and the schema proliferation
issues of option 3
• Retains the original metadata records in their
native format
Disadvantages
• Would not support future provision of a
collection-, community-, or schema-level
traditional field-based advanced search
reflective of the granularity of the original
records
– Would require indexing of the accompanying
native metadata file
• Would not readily enable harvesting of native
metadata records
Considering Options for Metadata
Management
• Generate DC records as abstractions of the
native metadata records and submit the
native metadata records as digital object bit-
streams
– DC records act as bibliographic descriptions of the
native metadata records. The original records are
submitted as accompanying bit-streams
Selected
Metadata Mapping
• Metadata from the source databases was
mapped to DC to enable simple keyword
searching within DSpace and DC-based OAI
harvesting
Metadata Transfer
• Records were exported as CSV files, each record
comprising a row in the file.
• The author created a Python script, which wrote each
row to two files.
– One was a DC XML file and the other a native metadata file
– The script also packaged the metadata and associated data
files in a format suitable for submission to DSpace
• A selection of records were manually sampled and
compared and additional scripting ensured that all
records were correctly transferred
JScholarship at The Johns
Hopkins University
JScholarship
• JScholarship (http://jscholarship.library.jhu.edu), the
Johns Hopkins institutional repository, is the
home for research materials created by faculty
& staff from the university, the medical
institutions, and other affiliates such as the
Applied Physics Lab
• Launched in 2008
Management Structure
• This DSpace-based repository is a service
developed and operated jointly by the
Sheridan Libraries and the Welch Medical
Library
– Directors of both libraries and several key staff
members serve as the Oversight Group for
Jscholarship
• They establish high-level policies for the repository and
provides guidance to the IR manager in areas such as
content recruitment and assessment
Creating Metadata
• The Oversight Group decided to leave the
submission process and metadata creation to the
various research communities, with library staff
acting only in a training and advisory role
• Each community has created its metadata at the
time of submission, but the library is
experimenting with harvesting existing metadata
to use for batch ingestion of digitized library
collection
Policies
• Each research community establishes many of
the policies for its collections
– Including policies for both content & metadata
generation
– Allows for personalization in each community
• How has this affected metadata in two of the
communities in JScholarship?
Center for Africana Studies
• Created collections for center research, faculty
articles, and working papers
• Researchers contributing content are
decentralized – belong to many dpts
• An administrative assistant gathers research,
uploads files, and creates the metadata for
each of the Center’s collections
Center for Africana Studies
• The interdisciplinary nature of the collections
does not lend itself to using a specialized
controlled vocabulary for subject terms
• Although a wide-ranging thesaurus would
work with these materials, the Center has
opted to use keywords from the articles
themselves
Hopkins Population Center
• Faculty associates produce most of the
research in working papers, conference
proceedings, and journal articles
• Instead of having a single person perform the
submission, metadata creation, and approval,
they had students perform some of the
submission and basic metadata tasks
Hopkins Population Center
• The submissions were then checked and
enhanced by a liaison librarian from the Welch
Medical Library
• The only community to use a controlled
vocabulary for subject terms
– Already have their own thesaurus for their
POPLINE database, they decided to use those
terms in the JScholarship
Ninth Floor
Learning Repository
Technologies
Next stop:
10th Floor – Learning Repository
Business Models
9

MetadataTheory: Learning Repositories Technologies (9th of 10)

  • 1.
  • 2.
    Structure • Federation &Harvesting • Operations of LORs • Case Studies of LORs under development – Policy and Technical Issues at the University of Sydney Library – JScholarship at The Johns Hopkins University
  • 3.
    Federation & Harvesting •Federated searches are conducted by search engines accessing many different databases with a the same query • Harvesting, on the other hand, refers to the gathering together of metadata from a number of distributed repositories into one portal website
  • 4.
    Operations of LORs •search/find – the ability to locate an appropriate learning object. This can include the ability to browse • quality control – a system that ensures learning objects meet technical, educational and metadata requirements • request – a learning object that has been located in the database
  • 5.
    Operations of LORs •maintain - appropriate version control • retrieve – receive an object that has been requested • submit – provide an object to a repository for storage • store – place a submitted object into a data store with unique, registered identifiers that allow it to be located
  • 6.
    Operations of LORs •gather (push/pull) – obtain metadata about objects in other repositories for wider searches and information via a clearing house function • publish – provide metadata to other repositories
  • 7.
  • 8.
    Policy and TechnicalIssues at the University of Sydney Library
  • 9.
    University of SydneyLibrary • The University of Sydney Library supports research, learning, and teaching through a variety of initiatives and collaborative activities with academics • Aim to develop guidelines to support a consistent and sustainable approach to dealing with requests to manage materials within the repository
  • 10.
    Collection Descriptions • Anindividual academic, research project team, or a group of academics working within a discipline created the collections • Academics are aware of the facility of descriptive metadata for categorizing and interrogating datasets – They adopt or modify domain standards or create rich and often highly granular tag sets to suit project requirements • The collections are not large, generally in the range of tens or hundreds of gigabytes
  • 11.
    Collection Descriptions • Metadatais typically held in databases including File-maker and MySQL or spreadsheet applications such as Microsoft Excel, with associated data objects housed on personal computer or departmental file systems • Collections under discussion arise from: – School of Geosciences, – Sydney College of the Arts, Department of Archaeology and – School of Biological Sciences
  • 12.
    Defining Metadata Management Requirements •Retain the granularity of the native record • Enable export, including Open Archives Initiative (OAI) harvesting, of records in DC and native format • Enable development of schema-specific search interfaces, whether through repository tools or integration with other services. • Ensure service sustainability
  • 13.
    Considering Options forMetadata Management • Map native metadata to existing DC elements – Native metadata records are mapped to DC and transferred to the repository as standard DC records – All approaches have advantages and disadvantages related to: • The loss of information from existing metadata • The use of the metadata after their transformation and practically the ways in which metadata can be used from that point on
  • 14.
    Advantages • Low submissioncost and low ongoing maintenance cost, • No configuration or maintenance of DSpace index keys needed, • Customized metadata schemas, or OAI crosswalks, • Records fully searchable through default DC indexing and harvestable via default OAI
  • 15.
    Disadvantages • Loss ofmetadata granularity and inability to recreate the original records • Many items of metadata would not be meaningful without contextual information provided by their native tags • Does not support provision of a traditional field-based advanced search effective of the granularity of the original records
  • 16.
    Considering Options forMetadata Management • Map native metadata to DC elements and create new custom qualifiers for standard DC tags – Native metadata records are mapped to DC and transferred to the repository as standard DC records. The granularity of non-DC elements is retained through mapping to customized qualifiers of standard DC tags
  • 17.
    Advantages • Retains thegranularity of the native records, supporting recreation of the original metadata records. Also retains contextual information conveyed by the original tags • Requires no configuration or maintenance of DSpace index keys, customized metadata schemas, or OAI crosswalks • Records would be fully searchable via default DC indexing and harvestable via default OAI
  • 18.
    Disadvantages • Higher submissionand maintenance costs than option 1, requiring additional and ongoing recordkeeping and maintenance procedures • As DC qualifiers proliferate, management of the central registry may pose challenges
  • 19.
    Considering Options forMetadata Management • Create a custom schema identical to the native metadata set – A custom schema separate to DC is implemented within the repository. Metadata records are transferred to the repository in their native formats
  • 20.
    Advantages • Avoids theDC registry management problems of option 2, by enabling partitioning and separate maintenance of each custom schema • May enable future provision of a collection-, community-, or schema-level traditional field- based advanced search reflective of the granularity of the original records
  • 21.
    Disadvantages • Requires configurationand ongoing maintenance of DSpace index keys, customized metadata schemas, and OAI crosswalks • May result in a proliferation of project-specific schemas requiring ac-companying recordkeeping and maintenance • Will not assist in the management of hierarchical metadata schemas, as DSpace does not support these
  • 22.
    Considering Options forMetadata Management • Generate DC records as abstractions of the native metadata records and submit the native metadata records as digital object bit- streams – DC records act as bibliographic descriptions of the native metadata records. The original records are submitted as accompanying bit-streams
  • 23.
    Advantages • Relatively lowsubmission cost and low ongoing maintenance cost • Requires no configuration or maintenance of DSpace index keys, customized metadata schemas, or OAI crosswalks • Depending on how much of the original metadata is mapped to standard DC, records could be keyword searchable via default DC indexing
  • 24.
    Advantages • DC versionsof the records would be harvestable via default OAI • Avoids the DC registry management problems of option 2 and the schema proliferation issues of option 3 • Retains the original metadata records in their native format
  • 25.
    Disadvantages • Would notsupport future provision of a collection-, community-, or schema-level traditional field-based advanced search reflective of the granularity of the original records – Would require indexing of the accompanying native metadata file • Would not readily enable harvesting of native metadata records
  • 26.
    Considering Options forMetadata Management • Generate DC records as abstractions of the native metadata records and submit the native metadata records as digital object bit- streams – DC records act as bibliographic descriptions of the native metadata records. The original records are submitted as accompanying bit-streams Selected
  • 27.
    Metadata Mapping • Metadatafrom the source databases was mapped to DC to enable simple keyword searching within DSpace and DC-based OAI harvesting
  • 28.
    Metadata Transfer • Recordswere exported as CSV files, each record comprising a row in the file. • The author created a Python script, which wrote each row to two files. – One was a DC XML file and the other a native metadata file – The script also packaged the metadata and associated data files in a format suitable for submission to DSpace • A selection of records were manually sampled and compared and additional scripting ensured that all records were correctly transferred
  • 29.
    JScholarship at TheJohns Hopkins University
  • 30.
    JScholarship • JScholarship (http://jscholarship.library.jhu.edu),the Johns Hopkins institutional repository, is the home for research materials created by faculty & staff from the university, the medical institutions, and other affiliates such as the Applied Physics Lab • Launched in 2008
  • 31.
    Management Structure • ThisDSpace-based repository is a service developed and operated jointly by the Sheridan Libraries and the Welch Medical Library – Directors of both libraries and several key staff members serve as the Oversight Group for Jscholarship • They establish high-level policies for the repository and provides guidance to the IR manager in areas such as content recruitment and assessment
  • 32.
    Creating Metadata • TheOversight Group decided to leave the submission process and metadata creation to the various research communities, with library staff acting only in a training and advisory role • Each community has created its metadata at the time of submission, but the library is experimenting with harvesting existing metadata to use for batch ingestion of digitized library collection
  • 33.
    Policies • Each researchcommunity establishes many of the policies for its collections – Including policies for both content & metadata generation – Allows for personalization in each community • How has this affected metadata in two of the communities in JScholarship?
  • 34.
    Center for AfricanaStudies • Created collections for center research, faculty articles, and working papers • Researchers contributing content are decentralized – belong to many dpts • An administrative assistant gathers research, uploads files, and creates the metadata for each of the Center’s collections
  • 35.
    Center for AfricanaStudies • The interdisciplinary nature of the collections does not lend itself to using a specialized controlled vocabulary for subject terms • Although a wide-ranging thesaurus would work with these materials, the Center has opted to use keywords from the articles themselves
  • 36.
    Hopkins Population Center •Faculty associates produce most of the research in working papers, conference proceedings, and journal articles • Instead of having a single person perform the submission, metadata creation, and approval, they had students perform some of the submission and basic metadata tasks
  • 37.
    Hopkins Population Center •The submissions were then checked and enhanced by a liaison librarian from the Welch Medical Library • The only community to use a controlled vocabulary for subject terms – Already have their own thesaurus for their POPLINE database, they decided to use those terms in the JScholarship
  • 38.
    Ninth Floor Learning Repository Technologies Nextstop: 10th Floor – Learning Repository Business Models 9