Metadata crosswalks


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Would like to now consider Caplan and Guenther’s paper describing the DC to MARC crosswalk mapping at its beginnings in 1996. What follows are specific fields, the problems raised by C&G, and how they were resolved in the current crosswalk. Will then try to summarize how these issues in crosswalks were resolved
  • So there is loss of information – lose the distinction between title and subtitle – an imperfect conversion
  • Why did I need to develop a replacement to the DOS-based utility? I’ve always done a lot of consulting work and the DOS-based tools was always my favorite tools. But as I moved to an NT-based system, I started to have more trouble with all DOS software so I decided to develop a windows alternative. Originally, I’d planned on just creating MarcEdit for my own use, but in June 2000, OSU needed to do a large call number flipping project and when I showed MarcEdit to a collegie, Kyle Banerjee, he convinced me that I should make this program available to the public.
  • This is really the heart of MarcEdit All utilities and functions interact with the MARCEngine in some fashion.
  • Metadata crosswalks

    1. 1. Crosswalks March 25, 2013 Richard Sapon-White 1
    2. 2. Overview Crosswalk definition and description Issues 2
    3. 3. InteroperabilitySearch interoperability The ability to perform a search over diverse sets of metadata records to obtain meaningful resultsToday’s session focuses on sets of records using different metadata schemes 3
    4. 4. Definition An authoritative mapping from the metadata elements of one scheme to the elements of another Example: Dublin Core to MARC Crosswalk 4
    5. 5. Reciprocal Crosswalks Two crosswalks are needed to map from metadata scheme A to scheme BAND from scheme B to scheme A With two crosswalks, “round-trip” mapping results in loss or distortion of information 5
    6. 6. More Examples Library of Congress has crosswalks for MARC21 to/from – DC (Dublin Core) – FGDC Content Standards for Geospatial Metadata (Federal Geographic Data Committee) – GILS (Global Information Locator Service) – ONIX ((ONline Information eXchange) 6
    7. 7. Uses of Crosswalks Record exchange Union catalogs Metadata harvesting Search engines: query fields with similar content in different databases Aid to understanding unfamiliar schemes 7
    8. 8. Complexities of CrosswalkCreation No standard format for metadata schemes – Different properties of elements are specified – Same properties may employ different terms Some elements may map to multiple elements in a second scheme, or vice versa Elements may be repeatable in one scheme, non-repeatable in another 8
    9. 9. Complexities of CrosswalkCreation (cont.) Source scheme may specify an element for which there is no comparable element in the target scheme Differences in content rules (e.g., use of a controlled vocabulary) or data representation (e.g., Michał Kowalski vs. Kowalski, Michał) 9
    10. 10. Issues in Crosswalking ContentMetadata StandardsBarriers to creating crosswalks1. Lack of common terminology between metadata schemes2. Metadata standards are not organized in the same wayMargaret St. Pierre and William LaPlant (1998) 10
    11. 11. St. Pierre and LaPlant (cont.)Barriers to mapping One-to-many mapping: source field contains multiple keywords while target field is repeatable with one keyword per field Many-to-one mapping: results in loss of information Source element does not map to any element in target Mandatory element in target without any element in source 11
    12. 12. Example Dublin Core element “Creator” – an uncontrolled name Creator did not map to MARC MARC name fields defined as main or added entries (1xx, 7xx) - content defined by AACR2 To develop a crosswalk, a new 720 field was added to MARC 12
    13. 13. Mapping DC Subject to MARC DC Subject – the topic addressed by the work – Can be qualified by the scheme (e.g., LCSH) MARC fields 600, 630, 650, 651, 653 – 600, 630, 650, 651 are controlled vocabulary with indicator for the scheme used – 653 is uncontrolled vocabulary If map to 653, then lose identification of controlled vocabulary 13
    14. 14. Mapping DC Subject to MARC(cont.) Cannot map to other subject fields since DC doesn’t distinguish between them Suggestion: create new MARC field for generic subject field (not done) Unqualified: 653 ##$a (Index Term--Uncontrolled) Qualified: Scheme=LCSH: 650 #0$a (Subject added entry--Topical term) Scheme=MeSH: 650 #2$a (Subject added entry--Topical term) Scheme=LCC: 050 ##$a (Library of Congress Call Number/Classification number) Scheme=DDC: 082 ##$a (Dewey Decimal Call Number/Classification number) Scheme=UDC: 080 ##$a (Universal Decimal Classification Number) Scheme=(other): 650 #7$a with $2=code from MARC Code List for 14 Relators, Sources, Description Conventions
    15. 15. Mapping DC Title to MARC DC Title does not distinguish between title (245 $a) and subtitle (245 $b) or any other kinds of titles Unqualified: – 245 00$a (Title Statement/Title proper) – If repeated, all titles after the first: 246 33$a (Varying Form of Title/Title proper) Qualified: – Alternative: 246 33$a (Varying Form of Title/Title proper) 15
    16. 16. Mapping DC Publisher to MARC One-to-one relationship between DC Publisher and MARC 260 $b EASY! 16
    17. 17. Mapping DC Date to MARC Publication date in DC element Date best maps to MARC21 260 $c Other dates exist in MARC21: – 008/07-10: date in standardized form – 260 $c can also include copyright or printing datesUnqualified: 260 ##$c (Date of publication, distribution, etc.) 17
    18. 18. Mapping DC Date to MARC(cont.)Qualified DC:Available: 307 ##$a (Hours, Etc.)Created: 260 ##$g (Date of manufacture)Issued: 260 ##$c (Date of publication, distribution, etc.)Modified: 583 ##$d with $a=modifiedValid: 518 ##$a (Date/Time and Place of an Event Note). Text may be generated in $3 to include qualifier name. 18
    19. 19. Mapping DC Identifier to MARC DC Identifier is any string or number used to uniquely identify an object Could be ISBN, ISSN, LCCN, URL – Each coded differently in MARC21 MARC 024 (other standard identifier) could be used if type of identifier not specified 19
    20. 20. Mapping DC Identifier to MARC(cont.)Unqualified: 024 8#$a (Other Standard Identifier/Standard number or code)Qualified: Scheme=URI: 856 40$u (Electronic Location and Access/Uniform Resource Locator) Scheme=ISBN: 020 ##$a (International Standard Book Number) Scheme=ISSN: 022 ##$a (International Standard Serial Number) Scheme=(other): 024 8#$a (Other Standard Identifier/Standard number or code) with $2=scheme value 20
    21. 21. Resolving Difficulties inCrosswalk Creation: A Summary Create a new field in MARC Use qualifiers (Qualified DC) to map to specific MARC fields If using unqualified DC, then map to closest matching field (with loss of some information) – Some information maps to a “wrong” field – Map to an “other” or “uncontrolled” field 21
    22. 22. Introduction to MarcEdit, from first run to philosophy Terry Reese Gray Family Chair for Innovative Library Services Oregon State University Email:
    23. 23. Getting Started1. Sample Data Files – Sample MARC records need to be downloaded. – Get them from: (~5 MB) – Unzip the data to the Desktop • Right click, Extract all to Desktop. – Worksheet File • Includes the examples that I’ll be working from: – – When you start MarcEdit for the first time, it will ask you to update. Don’t. Tell it no – then we’ll turn off the automated update checker. – We’ll use this information later.
    24. 24. Keypoints What is MarcEdit? – Background – System Requirements Installation Notes – First Run Understanding the Application Settings – Editor Settings – Language settings Accessing Application Data MarcEdit Infrastructure Getting Help Questions
    25. 25. What is MarcEdit? Started development in 1999 – Originally coded in 3 programming languages: Assembler (libraries), Visual Basic (UI) and Delphi (COM). – Initially designed as a replacement for LC’s DOS-based MARCBreakr/MARCMakr software
    26. 26. What is MarcEdit? Today: – Written in C# – Continues to be freely available – Supports both UTF/MARC8 charactersets – MARC Neutral – XML aware
    27. 27. Installing MarcEdit Windows: – Installing from the Windows Installer • 32-bit version: software/development/MarcEdit_Setup.msi • 64-bit version: software/development/MarcEdit_Setup64.msi – Installing using a Zip file: •
    28. 28. Setting up MarcEdit Onfirst run, MarcEdit will ask you to confirm some settings. These are broken down into 5 areas – MarcEditor – Language – Export – MARCEngine – Other
    29. 29. MarcEdit Export Properties Defines MARC import Can capture port output from record input (much in the same way OCLC’s Connexion can)
    30. 30. MARC Conversions
    31. 31. MarcEdit: crosswalking design  MarcEdit model: – So long as a schema has been mapped to MARCXML, any metadata combination could be utilized. This means that no more than two tranformations will ever take place. Example: MODS  MARCXML  EAD
    32. 32. MarcEdit: crosswalking design  MarcEdit Crosswalk model – Pro • Crosswalks need not be directly related to each other • Requires crosswalker to know specific knowledge of only one schema – Con • each known crosswalk must be mapped to MARCXML.
    33. 33. MarcEdit Crosswalking model EAD Dublin Core FGDC MARC21XML MARC MODS
    34. 34. MarcEdit: Crosswalks for everyone
    35. 35. MarcEdit: Crosswalks for everyone  Example Crosswalks: – MODS => MARC – MODS => FGDC – MODS => Dublin Core – EAD => MODS – EAD=>HTML
    36. 36. MarcEdit: Crosswalks for everyone  What’s MarcEdit doing? – Facilitates the crosswalk by: 1. Performing character translations (MARC8-UTF8) 2. Facilitates interaction between binary and XML formats.
    37. 37. Examples Project Gutenburg RDF => MARC EAD=>MARC
    38. 38. MarcEdit Demo arcedit/html/index.php 38