Open Archives Initiative -Protocol    for Metadata Harvesting            April 8, 2013        Richard Sapon-White         ...
Overview Definitions History The OAI Model Protocol for Metadata Harvesting                                     2
Definitions Harvester - client application issuing OAI-PMH  requests Harvesting - the gathering together of metadata  fr...
History of the OAI E-print servers = archives or repositories E-print servers provide access to scientific and  technica...
History of the OAI (cont.) Why?      Scholarly research belongs to people      Speeds the sharing of research      Bet...
History of the OAI (cont.) Many e-print servers grew     Overlapping disciplinary coverage     Overlapping geographic c...
History of the OAI (cont.) Meeting of experts, 1999, Santa Fe, New Mexico,  USA Defined an interface so that repositorie...
The Open Archives Model Similar concept to union catalog Metadata “harvested” and stored in central  repository “Pull” ...
PMH and Z39.50 Differs from Z39.50 (specifically rejected at Santa  Fe) Z39.50:      allows a client to search a remote...
PHM and Z39.50 (cont.) PHM is a simple protocol User interacts with database of harvested metadata,  not with individual...
Metadata Harvesting Protocol Queries and responses carried over http Harvester application can request a single  metadat...
Metadata Harvesting Protocol                (cont.) OAI-compliant data providers are capable of  responding to such reque...
Metadata Harvesting Protocol                 (cont.) Servers can also provide metadata in other schemes  beside DC Harve...
Why the OAI PHM is                 important Provides for a minimal level of interoperability Drives development of comm...
QUIZ!!! http://www.oaforum.org/tutorial/english/page1.h                                             15
Problems with Metadata                Harvesting Loss of data when mapping unqualified DC Incorrect data from improper m...
Metasearching Many systems = many metadata standards Convert to single system (harvesting)? Maintain individual element...
Definition From NISO MetaSearch Initiative:  “search and retrieval to span multiple databases,  sources, platforms, proto...
Z39.50 Allows computers to communicate to  retrieve information – between client and  server Searches and results are re...
Z39.50 results Server may interpret the query incorrectly     Some automatically add Boolean “and” while      others add...
Problems with Z39.50 High recall, little precision Also present in Google Search: few studies  on user satisfaction Res...
Metasearching: pros and cons Single database searching allows users to use  specialized indexing or controlled  vocabular...
Case Studies Divide into 3-4 groups Read the case study Discuss and report:     Describe the case briefly (2 min.)   ...
Upcoming SlideShare
Loading in...5
×

Metadata april 8 2013

160

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
160
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • No coverage of technical details – beyond me. Do want to cover concepts, definitions so that if someone talks to you about these things, you will understand
  • Metadata april 8 2013

    1. 1. Open Archives Initiative -Protocol for Metadata Harvesting April 8, 2013 Richard Sapon-White 1
    2. 2. Overview Definitions History The OAI Model Protocol for Metadata Harvesting 2
    3. 3. Definitions Harvester - client application issuing OAI-PMH requests Harvesting - the gathering together of metadata from a number of distributed repositories into a combined data store Archives – synonym for a repository of scholarly papers Protocol - a set of rules defining communication between systems (such as ftp or http) 3
    4. 4. History of the OAI E-print servers = archives or repositories E-print servers provide access to scientific and technical papers, scholarly journal articles Authors deposit pre-prints or published articles in these repositories Concept: public, free access to scholarly information without paid subscription to journals 4
    5. 5. History of the OAI (cont.) Why?  Scholarly research belongs to people  Speeds the sharing of research  Better for authors and readers Known as the “open archives movement” Has nothing to do with physical archives (repositories of institutional history or collections of unpublished materials) 5
    6. 6. History of the OAI (cont.) Many e-print servers grew  Overlapping disciplinary coverage  Overlapping geographic coverage Developing need to  search multiple repositories simultaneously (=federated searching)  automatically identify and copy papers from other repositories (=repository synchronization) 6
    7. 7. History of the OAI (cont.) Meeting of experts, 1999, Santa Fe, New Mexico, USA Defined an interface so that repositories could expose metadata for papers they held Metadata could then be discovered by federated search services and other repositories and copied Known as the Santa Fe Convention (later developed into PMH – Protocol for Metadata Harvesting 7
    8. 8. The Open Archives Model Similar concept to union catalog Metadata “harvested” and stored in central repository “Pull” rather than “push” model Collecting is similar to Internet spider collecting HTML content 8
    9. 9. PMH and Z39.50 Differs from Z39.50 (specifically rejected at Santa Fe) Z39.50:  allows a client to search a remote information server across a network  Difficult to perform high-quality federated searches across many servers – would need to deal with each server individually  Complex protocol 9
    10. 10. PHM and Z39.50 (cont.) PHM is a simple protocol User interacts with database of harvested metadata, not with individual repositories Database is constructed by the federated search service using PHM Therefore, performance depends only on the federated search service, not the individual repositories 10
    11. 11. Metadata Harvesting Protocol Queries and responses carried over http Harvester application can request a single metadata record or group of records to be exported  Application can restrict records by date to only gather new records (since previous harvesting) 11
    12. 12. Metadata Harvesting Protocol (cont.) OAI-compliant data providers are capable of responding to such requests  Data provider must be able to export metadata in at least DC (unqualified) using XML communication syntax  Data provider includes URI with metadata 12
    13. 13. Metadata Harvesting Protocol (cont.) Servers can also provide metadata in other schemes beside DC Harvester applications can request metadata in other schemes beside DC Harvester applications can also query a metadata repository for:  List of metadata formats supported by repository  List of record sets supported by the repository  List of the identifiers of all records within the repository 13
    14. 14. Why the OAI PHM is important Provides for a minimal level of interoperability Drives development of community-specific metadata schemes Potential for new modes of scholarly communication Dependent on widespread implementation by research organizations, publishers, and “memory organizations” (i.e., libraries, museums, archives) 14
    15. 15. QUIZ!!! http://www.oaforum.org/tutorial/english/page1.h 15
    16. 16. Problems with Metadata Harvesting Loss of data when mapping unqualified DC Incorrect data from improper mapping Inconsistent punctuation and formatting because of diverse sources of metadata  High variance in data between institutions 16
    17. 17. Metasearching Many systems = many metadata standards Convert to single system (harvesting)? Maintain individual element sets BUT create interface to search simultaneously across heterogeneous databases Voila: Metasearching!  Not a single method 17
    18. 18. Definition From NISO MetaSearch Initiative: “search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at one time.” Best known: Z39.50 protocol. Used to search remote library catalogs. 18
    19. 19. Z39.50 Allows computers to communicate to retrieve information – between client and server Searches and results are restricted to Z39.50 databases 19
    20. 20. Z39.50 results Server may interpret the query incorrectly  Some automatically add Boolean “and” while others add Boolean “or”  Vocabulary issues – different vocabulary in different databases  Display results in order retrieved, by database found, by data, by relevance 20
    21. 21. Problems with Z39.50 High recall, little precision Also present in Google Search: few studies on user satisfaction Results may display in an irrelevant order for the searcher 21
    22. 22. Metasearching: pros and cons Single database searching allows users to use specialized indexing or controlled vocabulary Single portal:  No need for searcher to select a particular database from list of databases 22
    23. 23. Case Studies Divide into 3-4 groups Read the case study Discuss and report:  Describe the case briefly (2 min.)  What can we learn from this case study? (3 min.) 23
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×