Metadata Mapping
&
Metadata Crosswalks
Nikos Palavitsinis, PhD
Alternative Title
”the story of combining
Ariadne’s thread with the Gordian Knot”
What are crosswalks?
• Crosswalks show people where to put the data
from one scheme into a different scheme. They
are often used by libraries, archives, museums,
and other cultural institutions to translate data
to or from MARC, Dublin Core, TEI, and
other metadata schemes.
source
One-way only
The process of translating from one schema to another is called
metadata mapping or field mapping [source]
Crosswalk from MARC to DC Crosswalk from DC to MARC
Mapping Problems
• Element A in Scheme A contains X values that
need to be split up into Element 1 and
Element 2 of Scheme B
• Element A in Scheme A can take more that
one values (multiplicity of n) whereas the
equivalent Element 2 in Scheme B, takes all
these values in a single field
Mapping Problems
• Different data formats across schemas (use of
names, other conventions, etc.)
• Element A in Scheme A is indexed but the equivalent
element in the other scheme is not
• Scheme A uses a different controlled vocabulary for
the same Element than Scheme B
“The more metadata experience we have, the more
it becomes clear that metadata perfection is not
attainable, and anyone who attempts it will be
sorely disappointed.
When metadata is crosswalked between two or
more unrelated sources, there will be data elements
that cannot be reconciled in an ideal manner. The
key to a successful metadata crosswalk is intelligent
flexibility. It is essential to focus on the important
goals and be willing to compromise in order to
reach a practical conclusion…“
"Metadata in Practice" Diane I. Hillmann and Elaine L. Westbrooks, eds.,
American Library Association, Chicago, 2004, p. 91.
Automated?
• Metadata Crosswalks can be automated, but
due to the complexity of metadata standards
and the extent of customization taking place,
only few general purpose automated
processes exist for crosswalks
Mapping between formats
• Excellent resource by Michael Day of UKOLN
– http://www.ukoln.ac.uk/metadata/interoperability/
Source
Metadata Element Set
• Two key components
– Semantics: Definitions of the meanings of the
elements
– Content: Declarations or instructions (or rules) of
what and how values should be assigned to
elements
Why map metadata?
• “Interoperability is the ability of multiple
systems with different hardware and software
platforms, data structures, and interfaces to
exchange data with minimal loss of content
and functionality”
NISO (National Information Standards Organization). (2004). Understanding metadata. Bethesda, MD: NISO
Press. Available: <http://www.niso.org/standards/resources/UnderstandingMetadata.pdf>.
Interoperability
…on a schema level
focusing on the elements of the schemas, being independent of
any applications. Derived element sets, encoded schemas,
crosswalks, application profiles, and element registries
…on a record level
focusing on integrating metadata records through the mapping
of the elements according to the semantic meanings of these
elements. Converted records and new records resulting from
combining values of existing records
Interoperability
…on a repository level
focusing on mapping value strings associated with particular
elements (terms associated with subject or format elements).
The results enable cross-collection searching
Source: http://www.dlib.org/dlib/june06/chan/06chan.html
Interoperability on the schema level
• This is achieved through:
– Derivation
• Using elements from existing schemas or standards, as
they are
– Application Profiling
• Localizing and optimizing schemata for specific contexts
– Metadata Crosswalks
• mapping elements, semantics, and syntax from one
metadata scheme to those of another
Interoperability on the schema level
• This is achieved through:
– Switching Across
• When trying to crosswalk among more schemas, using a
central one as a switch and crosswalking all to this one, is
easier
– Metadata Framework
• Either developing it based on existing schemas, or
establishing it before the development of schemas and
application profiles
– Metadata Registry
• Offering a centralized access point to existing schemas, to
facilitate the development of new ones and “foster”
interoperability
Crosswalking Approaches
• Absolute crosswalking
– You only match the elements that are 100%
equivalent and you ignore the rest
• Useful when mapping from a simpler to a more
complex schema
• Relative crosswalking
– You map all elements in a source schema to at
least one element of a target schema
• Useful when mapping from a complex to a simpler
schema
Three Meanings of Interoperability
• Semantic
– Semantic mapping is the process of analyzing the
definitions of the elements or fields to determine
whether they have the same or similar meanings
• Cultural
– presence of data models or wrappers that specify the
semantic schema being used
• Syntactic (technical)
– the ability to communicate, transport, store, and
represent metadata and other types of information
between and among different systems and schemas
Source
Examples of Metadata Ingestion
Bitter Harvest: Problems &
Suggested Solutions for
OAI-PMH Data & Service
Providers
Fill Partner Request Form
Process Partner Request Form and
decide on viable aggregation route
Send Data
Exchange
Agreement (DEA)
Inform
aggregator and
liaise with
potential data
provider
Sign DEA and send to Europeana (data
providers or aggregators have to sign
with aggregator)
Send Data Contribution Form
Fill Data Contribution Form and send to
Europeana
Process Data Contribution Form to
enable first delivery of data
Delivery of data via OAI-PMH or FTP
sample or full datasets
(new data providers)
Feedback on metadata structure,
mandatory elements, rights statements
Delivery of ingest ready data: full
datasets (all data providers)
Feedback taken
into account
Check data
Feedback on
metadata
structure,
mandatory
elements, rights
statements
Ingestion of
datasets fully
compliant to
publication
policy
Publication of the submitted datasets in
Europeana
Action for data
provider or
aggregator
Action for
Europeana
Before 5th
of a month
Before 15th of a
month
Before 21st
of a month
Between 21st
and 30th
of a month
Between 10th
and 20th of
following month
Source: Europeana_Sounds
Metadata Operations
• Metadata Harvesting
– The process of collecting metadata descriptions of records
in an archive so that services can be built using metadata
from many archives [source]
• Metadata Validation
– The process of checking the structure of a metadata record
to define whether or not the record complies to a
predefined set of criteria
• Metadata Ingestion
– The process of bringing metadata records (and/or
content), into your system [source]
– i.e. You ingest metadata through harvesting [source]
Metadata Operations
• Metadata Transformation
– Converting a set of metadata values from the format of a source
system into the format of a destination system [source]
• Metadata Enrichment
– The process of adding metadata to an existing metadata record,
thus creating a new record, with added-value operations
• Metadata Publishing
– The process of making metadata data elements available to
external users, both people and machines using a formal review
process and a commitment to change control processes [source]
Step 1
Harvesting
You harvest the metadata
through OAI-PMH in an
“intermediate” system
Step 2
Harvesting
Ingestion
The metadata are ingested into
the target repository or any
other intermediate system
Step 3
Harvesting
Ingestion
Metadata elements are mapped
to the metadata schema of the
receiving repository
Mapping
Step 4
Harvesting
Ingestion
Mapping
Validation
You pass the metadata through
a mechanism that checks their
integrity in reference to a pre-
defined standard/schema
Step 5
Harvesting
Ingestion
Mapping
Validation
Transformation
Metadata are subjected to the
necessary transformations
identified by the validation step
Step 6
Harvesting
Ingestion
Mapping
Validation
Transformation
Enrichment
If necessary, metadata may be
enriched further, adding value or
changing them altogether
Transformation
& Enrichment
Step 7… … …Step 1.223.124
Harvesting
Ingestion
Mapping
Validation
Transformation
Enrichment
Publishing
Metadata are published on the
target repository and are offered
also through an OAI-PMH target
And round it goes!
Reading Material
Other Sources/Projects/Initiatives:
• http://www.slideshare.net/RoldanBasilio/metadata-mapping-61747115
• http://pro.carare.eu/doku.php?id=support:metadata-mapping
• http://old.carare.eu/eng/Support/About-metadata-mapping
• https://en.wikipedia.org/wiki/Data_mapping
• http://www.oclc.org/research/themes/data-science/schematrans.html
• https://indico.cern.ch/event/103325/contributions/1300399/attachments/11668/17064/OAI7_UNSW.pdf
• http://www.slideshare.net/locloud/the-mint-mapping-tool-and-the-more-aggregator
• http://www.slideshare.net/Europeana_Sounds/aggregation-workflow
Metadata Mapping
&
Metadata Crosswalks
Nikos Palavitsinis, PhD
Alternative Title
”the story of combining
Ariadne’s thread with the Gordian Knot”

Metadata Mapping & Crosswalks

  • 1.
    Metadata Mapping & Metadata Crosswalks NikosPalavitsinis, PhD Alternative Title ”the story of combining Ariadne’s thread with the Gordian Knot”
  • 2.
    What are crosswalks? •Crosswalks show people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC, Dublin Core, TEI, and other metadata schemes. source
  • 3.
    One-way only The processof translating from one schema to another is called metadata mapping or field mapping [source] Crosswalk from MARC to DC Crosswalk from DC to MARC
  • 4.
    Mapping Problems • ElementA in Scheme A contains X values that need to be split up into Element 1 and Element 2 of Scheme B • Element A in Scheme A can take more that one values (multiplicity of n) whereas the equivalent Element 2 in Scheme B, takes all these values in a single field
  • 5.
    Mapping Problems • Differentdata formats across schemas (use of names, other conventions, etc.) • Element A in Scheme A is indexed but the equivalent element in the other scheme is not • Scheme A uses a different controlled vocabulary for the same Element than Scheme B
  • 6.
    “The more metadataexperience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. When metadata is crosswalked between two or more unrelated sources, there will be data elements that cannot be reconciled in an ideal manner. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise in order to reach a practical conclusion…“ "Metadata in Practice" Diane I. Hillmann and Elaine L. Westbrooks, eds., American Library Association, Chicago, 2004, p. 91.
  • 7.
    Automated? • Metadata Crosswalkscan be automated, but due to the complexity of metadata standards and the extent of customization taking place, only few general purpose automated processes exist for crosswalks
  • 8.
    Mapping between formats •Excellent resource by Michael Day of UKOLN – http://www.ukoln.ac.uk/metadata/interoperability/ Source
  • 9.
    Metadata Element Set •Two key components – Semantics: Definitions of the meanings of the elements – Content: Declarations or instructions (or rules) of what and how values should be assigned to elements
  • 10.
    Why map metadata? •“Interoperability is the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality” NISO (National Information Standards Organization). (2004). Understanding metadata. Bethesda, MD: NISO Press. Available: <http://www.niso.org/standards/resources/UnderstandingMetadata.pdf>.
  • 11.
    Interoperability …on a schemalevel focusing on the elements of the schemas, being independent of any applications. Derived element sets, encoded schemas, crosswalks, application profiles, and element registries …on a record level focusing on integrating metadata records through the mapping of the elements according to the semantic meanings of these elements. Converted records and new records resulting from combining values of existing records
  • 12.
    Interoperability …on a repositorylevel focusing on mapping value strings associated with particular elements (terms associated with subject or format elements). The results enable cross-collection searching Source: http://www.dlib.org/dlib/june06/chan/06chan.html
  • 13.
    Interoperability on theschema level • This is achieved through: – Derivation • Using elements from existing schemas or standards, as they are – Application Profiling • Localizing and optimizing schemata for specific contexts – Metadata Crosswalks • mapping elements, semantics, and syntax from one metadata scheme to those of another
  • 14.
    Interoperability on theschema level • This is achieved through: – Switching Across • When trying to crosswalk among more schemas, using a central one as a switch and crosswalking all to this one, is easier – Metadata Framework • Either developing it based on existing schemas, or establishing it before the development of schemas and application profiles – Metadata Registry • Offering a centralized access point to existing schemas, to facilitate the development of new ones and “foster” interoperability
  • 15.
    Crosswalking Approaches • Absolutecrosswalking – You only match the elements that are 100% equivalent and you ignore the rest • Useful when mapping from a simpler to a more complex schema • Relative crosswalking – You map all elements in a source schema to at least one element of a target schema • Useful when mapping from a complex to a simpler schema
  • 16.
    Three Meanings ofInteroperability • Semantic – Semantic mapping is the process of analyzing the definitions of the elements or fields to determine whether they have the same or similar meanings • Cultural – presence of data models or wrappers that specify the semantic schema being used • Syntactic (technical) – the ability to communicate, transport, store, and represent metadata and other types of information between and among different systems and schemas Source
  • 17.
  • 18.
    Bitter Harvest: Problems& Suggested Solutions for OAI-PMH Data & Service Providers
  • 19.
    Fill Partner RequestForm Process Partner Request Form and decide on viable aggregation route Send Data Exchange Agreement (DEA) Inform aggregator and liaise with potential data provider Sign DEA and send to Europeana (data providers or aggregators have to sign with aggregator) Send Data Contribution Form Fill Data Contribution Form and send to Europeana Process Data Contribution Form to enable first delivery of data Delivery of data via OAI-PMH or FTP sample or full datasets (new data providers) Feedback on metadata structure, mandatory elements, rights statements Delivery of ingest ready data: full datasets (all data providers) Feedback taken into account Check data Feedback on metadata structure, mandatory elements, rights statements Ingestion of datasets fully compliant to publication policy Publication of the submitted datasets in Europeana Action for data provider or aggregator Action for Europeana Before 5th of a month Before 15th of a month Before 21st of a month Between 21st and 30th of a month Between 10th and 20th of following month Source: Europeana_Sounds
  • 20.
    Metadata Operations • MetadataHarvesting – The process of collecting metadata descriptions of records in an archive so that services can be built using metadata from many archives [source] • Metadata Validation – The process of checking the structure of a metadata record to define whether or not the record complies to a predefined set of criteria • Metadata Ingestion – The process of bringing metadata records (and/or content), into your system [source] – i.e. You ingest metadata through harvesting [source]
  • 21.
    Metadata Operations • MetadataTransformation – Converting a set of metadata values from the format of a source system into the format of a destination system [source] • Metadata Enrichment – The process of adding metadata to an existing metadata record, thus creating a new record, with added-value operations • Metadata Publishing – The process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes [source]
  • 22.
    Step 1 Harvesting You harvestthe metadata through OAI-PMH in an “intermediate” system
  • 23.
    Step 2 Harvesting Ingestion The metadataare ingested into the target repository or any other intermediate system
  • 24.
    Step 3 Harvesting Ingestion Metadata elementsare mapped to the metadata schema of the receiving repository Mapping
  • 25.
    Step 4 Harvesting Ingestion Mapping Validation You passthe metadata through a mechanism that checks their integrity in reference to a pre- defined standard/schema
  • 26.
    Step 5 Harvesting Ingestion Mapping Validation Transformation Metadata aresubjected to the necessary transformations identified by the validation step
  • 27.
    Step 6 Harvesting Ingestion Mapping Validation Transformation Enrichment If necessary,metadata may be enriched further, adding value or changing them altogether Transformation & Enrichment
  • 28.
    Step 7… ……Step 1.223.124 Harvesting Ingestion Mapping Validation Transformation Enrichment Publishing Metadata are published on the target repository and are offered also through an OAI-PMH target And round it goes!
  • 29.
    Reading Material Other Sources/Projects/Initiatives: •http://www.slideshare.net/RoldanBasilio/metadata-mapping-61747115 • http://pro.carare.eu/doku.php?id=support:metadata-mapping • http://old.carare.eu/eng/Support/About-metadata-mapping • https://en.wikipedia.org/wiki/Data_mapping • http://www.oclc.org/research/themes/data-science/schematrans.html • https://indico.cern.ch/event/103325/contributions/1300399/attachments/11668/17064/OAI7_UNSW.pdf • http://www.slideshare.net/locloud/the-mint-mapping-tool-and-the-more-aggregator • http://www.slideshare.net/Europeana_Sounds/aggregation-workflow
  • 30.
    Metadata Mapping & Metadata Crosswalks NikosPalavitsinis, PhD Alternative Title ”the story of combining Ariadne’s thread with the Gordian Knot”