It's All About the Metadata

746 views

Published on

UPDATED and REPLACED with new file June 2014

Simplified presentation on library metadata evolution, the perils of not curating the metadata properly, and how it's being used "in the wild".

But…it’s all on the internet and a keyword search will find it, right? Not exactly... There's been a massive change in cataloging in libraries with the rise of the internet. Everything is connected, including our metadata. Catalogers are no longer isolated, and metadata management is no longer just an internal process. Everything we do now links to the wider world of metadata, pushing libraries into re-purposing our long-held work into the new frontiers of identity management and linked data.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
746
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • But…it’s all on the internet and a keyword search will find it, right? Not exactly…


    Stop me if I get to into the technical details – much like the rabbit hole of links (linked data!) on the web, it’s easy to get lost in a deeper technical analysis of the use of metadata
  • Examples of searching with keywords

    Example 1
    w/o controlled vocabulary there’s no links (no see also) or unified results list; OPAC can’t fix that, you need the metadata to make it work

    Koran: http://catalog.library.georgetown.edu/search~S4/X?SEARCH=Koran&searchscope=4&SORT=D
    Quran: http://catalog.library.georgetown.edu/search~S4/X?SEARCH=quran&searchscope=4&SORT=D
    Qur’an:
  • w/o controlled vocab there’s no link (no see also); discovery layer can’t fix that, you need the metadata to make it work
  • Issues in Subject and Author indexes – lack of consistency; duplication; lack of collocation for items with same term
    No normalization

    Currently working on a project to clean these up so that each author and each subject has a unique single entry; using established forms when possible for future links (e.g. using the NAF form so that it can link out to Wikipedia and other pages)
  • http://www.vocativ.com/culture/fun/town-built-huge-memorial-wrong-guy/

    “concrete” (ha!) example of the perils of relying on keyword searching only
  • Library metadata is CURATED and CONTROLLED – making it reliable and authoritative and consistent

    OWL: http://www.w3.org/TR/owl-ref/ - publishing and sharing ontologies on the web, includes “SameAs” options for linking like things (such as linking a VIAF record to ISNI to NAF)

    Metadata is everywhere, but it’s not useful unless it’s managed – a keyword search can bring up a lot of disparate unrelated things…you need curated/controlled metadata to identify the thing you actually want and it’s links to other things
  • Impacts indexing, identitification, collocation of like things, define/display relationships

    COLLOCATION and DISAMBIGUATION

    DigitalGeorgetown: https://docs.google.com/spreadsheets/d/1oivICr3O1Drhn-Ncypi6dVc6gjdoaU-IYrNwJEGJBYI/edit?usp=sharing
  • http://id.loc.gov/
    http://linkeddata.org/

    Non library organizations mine the Library of Congress authority data and subject data, creating links and clarification
  • http://www.worldcat.org/oclc/70180992

    Schema.org is used by search engines (yes, all of them)
    http://www.oclc.org/news/releases/2012/201238.en.html

    Also see the SameAs references!!

    This then links to other available linked data sets from LC and OCLC and more.

    Available for download: WorldCat Works, FAST (subjects), VIAF, Dewey.info,
  • How? Because the metadata is available in a format that the search engines can mine and use
  • http://dbpedia.org/About

    http://www.wikipedia.org/

    ISNI: http://www.isni.org/about

    They are ALL linked together via metadata – more information out there, more links are made
  • eCIP example: OCLC# 880237744 – see the vendor information added to the record

    eCIP: The CIP record produced is published in the books on the copyright page and used as marketing and purchasing tools by vendors and libraries, facilitating ordering and processing of materials.


    CONSER data: curated by catalogers and the ISSN Centre (in Paris)
    Purchased and re-used by many companies, including SFX and SerialsSolutions/ProQuest

    Repurposed MARC records:
    ETDs
    Finding Aids
    Princeton Theological Seminary Theological Commons http://commons.ptsem.edu/
    DigitalGeorgetown scanned objects
  • Why? We don’t always have all the information, but we can incorporate it and verify it when we have it, establishing new identities and confirming/enhancing existing ones, adding more links
  • If you look, it’s in the underlying structure to the web, and library data is OUT THERE
  • It's All About the Metadata

    1. 1. IT’S ALL ABOUT THE METADATA Shana L. McDanold June 10, 2014 1
    2. 2. WHY DOES METADATA MATTER? – GEORGE Search: Koran Search: Quran 2
    3. 3. WHY DOES METADATA MATTER? – ONESEARCH Search: Koran Search: Quran 3
    4. 4. WHY DOES METADATA MATTER? – GEORGE Search: 9/11 Search: 9-11 4
    5. 5. WHY DOES METADATA MATTER? – ONESEARCH Search: 9/11 Search: 9-11 5
    6. 6. WHY DOES METADATA MATTER? – DIGITALGEORGETOWN Author Index Subject Index 6
    7. 7. WHY DOES METADATA MATTER?  “This town built a memorial to the wrong guy”  Ottawa, Canada  “It’s the metadata, stupid: and it’s not just for your audience” (Joshua Lasky, posted 5/21/2014)  “To succeed in the digital age is to be able to easily aggregate all of your articles in the most meaningful way for each of your visitors. Competitors such as Circa actively use metadata to surface relevant content during breaking news events.” 7
    8. 8. WHY DOES METADATA MATTER?  What are we trying to identify? OR What are people trying to find?  Works  Individuals  Places  Things/objects  Concepts  Discovery and discovery enhancement  Relationships  “On the fly” collections of resources  Users start elsewhere 8
    9. 9. WHAT DO WE DO WHEN WE CURATE [CREATE] METADATA?  Create and enhance descriptive metadata  Apply controlled vocabularies  Disambiguation of works, authors, etc.  Unique identification of editions, works, etc.  Collocation of editions, works, etc.  Use agreed upon standards for data elements to ensure consistent application/use  MARC  DigitalGeorgetown (DublinCore)  RDF (Resource Description Framework) 9
    10. 10. HOW DO WE EXPOSE “OUR” METADATA?  Controlled vocabulary and mapping  Genres  Subjects/Concepts  Classification  Identification:  People  Places/Geographic  Works  OWL (Web Ontology Language)  SKOS (Simple Knowledge Organization System)  Normalization  Indexing 10
    11. 11. OWL: WEB ONTOLOGY LANGUAGE  Utilizes RDF (Resource Description Framework)  5.2 Individual identity  Many languages have a so-called "unique names" assumption: different names refer to different things in the world. On the web, such an assumption is not possible. For example, the same person could be referred to in many different ways (i.e. with different URI references). For this reason OWL does not make this assumption. Unless an explicit statement is being made that two URI references refer to the same or to different individuals, OWL tools should in principle assume either situation is possible.  OWL provides three constructs for stating facts about the identity of individuals:  owl:sameAs is used to state that two URI references refer to the same individual.  owl:differentFrom is used to state that two URI references refer to different individuals  owl:AllDifferent provides an idiom for stating that a list of individuals are all different. 11
    12. 12. SKOS: SIMPLE KNOWLEDGE ORGANIZATION SYSTEM  Utilizes RDF (Resource Description Framework)  2.3 Semantic Relationships  In KOSs semantic relations play a crucial role for defining concepts. The meaning of a concept is defined not just by the natural-language words in its labels but also by its links to other concepts in the vocabulary. Mirroring the fundamental categories of relations that are used in vocabularies such as thesauri [ISO2788], SKOS supplies three standard properties:  skos:broader and skos:narrower enable the representation of hierarchical links, such as the relationship between one genre and its more specific species, or, depending on interpretations, the relationship between onewhole and its parts;  skos:related enables the representation of associative (non-hierarchical) links, such as the relationship between one type of event and a category of entities which typically participate in it. Another use for skos:related is between two categories where neither is more general or more specific. Note that skos:related enables the representation of associative (non- hierarchical) links, which can also be used to represent part-whole links that are not meant as hierarchical relationships. 12
    13. 13. CURATED METADATA IN THE WILD – LIBRARY OF CONGRESS  Library of Congress data exposed as linked data  “The Library of Congress Linked Data Service enables both humans and machines to programmatically access authority data at the Library of Congress. This service is influenced by -- and implements -- the Linked Data movement's approach of exposing and inter-connecting data on the Web via dereferenceable URIs.” 13
    14. 14. CURATED METADATA IN THE WILD - WORLDCAT  Bibliographic records 14
    15. 15. CURATED METADATA IN THE WILD - WORLDCAT  Google searches! 15
    16. 16. CURATED METADATA IN THE WILD - OTHERS  Wikipedia/dbpedia  WorldCat: links to WorldCat Identities  http://www.worldcat.org/identities/lccn-n79-007035/  LCCN: links to LC National Authority File (NAF)  http://id.loc.gov/authorities/names/n79007035.html  VIAF record  https://viaf.org/viaf/88919448/  ISNI (International Standard Name Identifier) record  http://isni-url.oclc.nl/isni/0000000121429031 16
    17. 17. CURATED METADATA IN THE WILD - OTHERS  Wikipedia/dbpedia  Disambiguation  http://en.wikipedia.org/w/index.php?title=Category:All_disambi guation_pages  Identity management:  John Smith http://en.wikipedia.org/wiki/John_Smith  St. Mary’s Church http://en.wikipedia.org/wiki/St._Mary%27s_Church  Georgetown http://en.wikipedia.org/wiki/Georgetown  Hamlet http://en.wikipedia.org/wiki/Hamlet_(disambiguation) 17
    18. 18. CURATED METADATA IN THE WILD - OTHERS  “MARC 21 records for CONSER serials either cataloged or processed by LC or by CONSER (Cooperative Online Serials Program) participants. Also includes records with ISSN assignments and U.S. Newspaper Program cataloging. Records include all languages. Available in MARC 21 and MARCXML formats.” eCIP CONSER 18
    19. 19. BUILDING CURATED METADATA: OTHER OPTIONS  Crowd sourcing  Archives and Alumni  Identification of individuals for identity control  Penn Provenance project  “We are trying to identify former owners and virtually reunite dispersed collections, and we welcome any information you have about the images posted here.”  Incorporate data into records; establish identities  https://www.flickr.com/photos/58558794@N07 19
    20. 20. CONCLUSION  All comes back to the basics of metadata work:  DESCRIPTION  COLLOCATION  DISAMBIGUATION (uniquely identifiable)  RELATIONSHIPS 20

    ×