SlideShare a Scribd company logo
Open Archives Initiative -Protocol
    for Metadata Harvesting

            April 8, 2013
        Richard Sapon-White




                                 1
Overview

 Definitions
 History
 The OAI Model
 Protocol for Metadata Harvesting




                                     2
Definitions

 Harvester - client application issuing OAI-PMH
  requests
 Harvesting - the gathering together of metadata
  from a number of distributed repositories into a
  combined data store
 Archives – synonym for a repository of scholarly
  papers
 Protocol - a set of rules defining communication
  between systems (such as ftp or http)

                                                     3
History of the OAI

 E-print servers = archives or repositories
 E-print servers provide access to scientific and
  technical papers, scholarly journal articles
 Authors deposit pre-prints or published articles in
  these repositories
 Concept: public, free access to scholarly
  information without paid subscription to journals


                                                        4
History of the OAI (cont.)

 Why?
      Scholarly research belongs to people
      Speeds the sharing of research
      Better for authors and readers
 Known as the “open archives movement”
 Has nothing to do with physical archives
  (repositories of institutional history or collections
  of unpublished materials)

                                                          5
History of the OAI (cont.)

 Many e-print servers grew
     Overlapping disciplinary coverage
     Overlapping geographic coverage
 Developing need to
     search multiple repositories simultaneously
      (=federated searching)
     automatically identify and copy papers from
      other repositories (=repository synchronization)

                                                         6
History of the OAI (cont.)

 Meeting of experts, 1999, Santa Fe, New Mexico,
  USA
 Defined an interface so that repositories could
  expose metadata for papers they held
 Metadata could then be discovered by federated
  search services and other repositories and copied
 Known as the Santa Fe Convention (later developed
  into PMH – Protocol for Metadata Harvesting

                                                  7
The Open Archives Model

 Similar concept to union catalog
 Metadata “harvested” and stored in central
  repository
 “Pull” rather than “push” model
 Collecting is similar to Internet spider
  collecting HTML content


                                               8
PMH and Z39.50

 Differs from Z39.50 (specifically rejected at Santa
  Fe)
 Z39.50:
      allows a client to search a remote information
       server across a network
      Difficult to perform high-quality federated searches
       across many servers – would need to deal with each
       server individually
      Complex protocol

                                                              9
PHM and Z39.50 (cont.)

 PHM is a simple protocol
 User interacts with database of harvested metadata,
  not with individual repositories
 Database is constructed by the federated search
  service using PHM
 Therefore, performance depends only on the
  federated search service, not the individual
  repositories

                                                    10
Metadata Harvesting Protocol

 Queries and responses carried over http
 Harvester application can request a single
  metadata record or group of records to be
  exported
     Application can restrict records by date to only
      gather new records (since previous harvesting)



                                                         11
Metadata Harvesting Protocol
                (cont.)
 OAI-compliant data providers are capable of
  responding to such requests
     Data provider must be able to export metadata in
      at least DC (unqualified) using XML
      communication syntax
     Data provider includes URI with metadata



                                                     12
Metadata Harvesting Protocol
                 (cont.)
 Servers can also provide metadata in other schemes
  beside DC
 Harvester applications can request metadata in
  other schemes beside DC
 Harvester applications can also query a metadata
  repository for:
      List of metadata formats supported by repository
      List of record sets supported by the repository
      List of the identifiers of all records within the repository

                                                                      13
Why the OAI PHM is
                 important
 Provides for a minimal level of interoperability
 Drives development of community-specific
  metadata schemes
 Potential for new modes of scholarly
  communication
 Dependent on widespread implementation by
  research organizations, publishers, and “memory
  organizations” (i.e., libraries, museums, archives)

                                                        14
QUIZ!!!

 http://www.oaforum.org/tutorial/english/page1.h




                                             15
Problems with Metadata
                Harvesting
 Loss of data when mapping unqualified DC
 Incorrect data from improper mapping
 Inconsistent punctuation and formatting
  because of diverse sources of metadata
     High variance in data between institutions




                                                   16
Metasearching

 Many systems = many metadata standards
 Convert to single system (harvesting)?
 Maintain individual element sets BUT create
  interface to search simultaneously across
  heterogeneous databases
 Voila: Metasearching!
     Not a single method

                                            17
Definition

 From NISO MetaSearch Initiative:
  “search and retrieval to span multiple databases,
  sources, platforms, protocols, and vendors at one
  time.”
 Best known: Z39.50 protocol. Used to
  search remote library catalogs.



                                                      18
Z39.50

 Allows computers to communicate to
  retrieve information – between client and
  server
 Searches and results are restricted to Z39.50
  databases




                                                  19
Z39.50 results

 Server may interpret the query incorrectly
     Some automatically add Boolean “and” while
      others add Boolean “or”
     Vocabulary issues – different vocabulary in
      different databases
     Display results in order retrieved, by database
      found, by data, by relevance


                                                        20
Problems with Z39.50

 High recall, little precision
 Also present in Google Search: few studies
  on user satisfaction
 Results may display in an irrelevant order for
  the searcher



                                               21
Metasearching: pros and cons

 Single database searching allows users to use
  specialized indexing or controlled
  vocabulary
 Single portal:
     No need for searcher to select a particular
      database from list of databases



                                                    22
Case Studies

 Divide into 3-4 groups
 Read the case study
 Discuss and report:
     Describe the case briefly (2 min.)
     What can we learn from this case study? (3 min.)




                                                     23

More Related Content

What's hot

Literature Services Resource Description Framework
Literature Services Resource Description FrameworkLiterature Services Resource Description Framework
Literature Services Resource Description Framework
Jee-Hyub Kim
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
SurendraKumar338
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
Hafiz Muhammad Zeeshan Raza
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
Alejandra Gonzalez-Beltran
 
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
Kumprinx Amin
 
Slide1 mis update
Slide1 mis updateSlide1 mis update
Slide1 mis update
BunSeng
 
Protein structure
Protein structureProtein structure
Protein structure
Pooja Pawar
 
Xerxes Roadmap
Xerxes RoadmapXerxes Roadmap
Xerxes Roadmap
dswalker
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
LIBER Europe
 
Applications of xml, semantic web or linked data in Library/Information Servi...
Applications of xml, semantic web or linked data in Library/Information Servi...Applications of xml, semantic web or linked data in Library/Information Servi...
Applications of xml, semantic web or linked data in Library/Information Servi...
Nurhazman Abdul Aziz
 
Integrated library system
Integrated library systemIntegrated library system
Integrated library system
Naeem ullah
 
Dynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File SystemsDynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File Systems
IJERA Editor
 
Lost In Translation when machines meet STM content
Lost In Translation when machines meet STM contentLost In Translation when machines meet STM content
Lost In Translation when machines meet STM content
scrazzl
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
Vidya Kalaivani Rajkumar
 
Role of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic LibraryRole of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic Library
New York University
 
Cds Isis Intro Huridocs
Cds Isis Intro HuridocsCds Isis Intro Huridocs
Cds Isis Intro Huridocs
huridocs
 

What's hot (16)

Literature Services Resource Description Framework
Literature Services Resource Description FrameworkLiterature Services Resource Description Framework
Literature Services Resource Description Framework
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Datasets with bioschemas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
 
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
UiTM IM110 IMD253 : ORGANIZATION OF INFORMATION (IMD253) Individual Assignment
 
Slide1 mis update
Slide1 mis updateSlide1 mis update
Slide1 mis update
 
Protein structure
Protein structureProtein structure
Protein structure
 
Xerxes Roadmap
Xerxes RoadmapXerxes Roadmap
Xerxes Roadmap
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
Applications of xml, semantic web or linked data in Library/Information Servi...
Applications of xml, semantic web or linked data in Library/Information Servi...Applications of xml, semantic web or linked data in Library/Information Servi...
Applications of xml, semantic web or linked data in Library/Information Servi...
 
Integrated library system
Integrated library systemIntegrated library system
Integrated library system
 
Dynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File SystemsDynamic Metadata Management in Semantic File Systems
Dynamic Metadata Management in Semantic File Systems
 
Lost In Translation when machines meet STM content
Lost In Translation when machines meet STM contentLost In Translation when machines meet STM content
Lost In Translation when machines meet STM content
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Role of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic LibraryRole of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic Library
 
Cds Isis Intro Huridocs
Cds Isis Intro HuridocsCds Isis Intro Huridocs
Cds Isis Intro Huridocs
 

Viewers also liked

RDA as an international standard
RDA as an international standardRDA as an international standard
RDA as an international standard
Richard.Sapon-White
 
Preparing your presentation
Preparing your presentationPreparing your presentation
Preparing your presentation
Richard.Sapon-White
 
Syllabus for electronic books in libraries
Syllabus for electronic books in librariesSyllabus for electronic books in libraries
Syllabus for electronic books in libraries
Richard.Sapon-White
 
E book standards
E book standardsE book standards
E book standards
Richard.Sapon-White
 
Use and preservation of e books
Use and preservation of e booksUse and preservation of e books
Use and preservation of e books
Richard.Sapon-White
 
Rda class, lecture 2
Rda class, lecture 2Rda class, lecture 2
Rda class, lecture 2
Richard.Sapon-White
 
VRA Core 4.0
VRA Core 4.0VRA Core 4.0
VRA Core 4.0
Richard.Sapon-White
 
Subject analysis, an introduction
Subject analysis, an introductionSubject analysis, an introduction
Subject analysis, an introduction
Richard.Sapon-White
 
Authentic Learning - an NPN Presentation
Authentic Learning - an NPN PresentationAuthentic Learning - an NPN Presentation
Authentic Learning - an NPN Presentation
Paul Herring
 

Viewers also liked (9)

RDA as an international standard
RDA as an international standardRDA as an international standard
RDA as an international standard
 
Preparing your presentation
Preparing your presentationPreparing your presentation
Preparing your presentation
 
Syllabus for electronic books in libraries
Syllabus for electronic books in librariesSyllabus for electronic books in libraries
Syllabus for electronic books in libraries
 
E book standards
E book standardsE book standards
E book standards
 
Use and preservation of e books
Use and preservation of e booksUse and preservation of e books
Use and preservation of e books
 
Rda class, lecture 2
Rda class, lecture 2Rda class, lecture 2
Rda class, lecture 2
 
VRA Core 4.0
VRA Core 4.0VRA Core 4.0
VRA Core 4.0
 
Subject analysis, an introduction
Subject analysis, an introductionSubject analysis, an introduction
Subject analysis, an introduction
 
Authentic Learning - an NPN Presentation
Authentic Learning - an NPN PresentationAuthentic Learning - an NPN Presentation
Authentic Learning - an NPN Presentation
 

Similar to Metadata april 8 2013

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
Nikesh Narayanan
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
petrknoth
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
Lena Bruncaj
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
Andy Powell
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
National Information Standards Organization (NISO)
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
Anita de Waard
 
The Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata HarvestingThe Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata Harvesting
Andy Powell
 
Linked library data
Linked library dataLinked library data
Linked library data
Jindřich Mynarz
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
petrknoth
 
Introduction to discovery layers- June 23b
Introduction to discovery layers- June 23bIntroduction to discovery layers- June 23b
Introduction to discovery layers- June 23b
Kathy Bryce
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Andy Powell
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
Open Science Fair
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
Libsoul Technologies Pvt. Ltd.
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
charper
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Final
guestcaef1d
 
Distributed Catalogue Code and Standards.pdf
Distributed Catalogue Code and Standards.pdfDistributed Catalogue Code and Standards.pdf
Distributed Catalogue Code and Standards.pdf
PravatKhadka
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
ukcorr
 
Rethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library userRethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library user
Sally Chambers
 
Mdld show-all
Mdld show-allMdld show-all
Mdld show-all
madhuvardhan
 

Similar to Metadata april 8 2013 (20)

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
The Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata HarvestingThe Open Archives Initiative Protocol for Metadata Harvesting
The Open Archives Initiative Protocol for Metadata Harvesting
 
Linked library data
Linked library dataLinked library data
Linked library data
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Introduction to discovery layers- June 23b
Introduction to discovery layers- June 23bIntroduction to discovery layers- June 23b
Introduction to discovery layers- June 23b
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
Cornell20080516
Cornell20080516Cornell20080516
Cornell20080516
 
Inteligent Catalogue Final
Inteligent Catalogue FinalInteligent Catalogue Final
Inteligent Catalogue Final
 
Distributed Catalogue Code and Standards.pdf
Distributed Catalogue Code and Standards.pdfDistributed Catalogue Code and Standards.pdf
Distributed Catalogue Code and Standards.pdf
 
Next Generation Repositories
Next Generation RepositoriesNext Generation Repositories
Next Generation Repositories
 
Rethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library userRethinking the library catalogue: making search work for the library user
Rethinking the library catalogue: making search work for the library user
 
Mdld show-all
Mdld show-allMdld show-all
Mdld show-all
 

More from Richard.Sapon-White

Rda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalecRda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalec
Richard.Sapon-White
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
Richard.Sapon-White
 
Continuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and ManagementContinuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and Management
Richard.Sapon-White
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
Richard.Sapon-White
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
Richard.Sapon-White
 
Metadata and the web
Metadata and the webMetadata and the web
Metadata and the web
Richard.Sapon-White
 
Metadata lecture 5 part 2
Metadata lecture 5 part 2Metadata lecture 5 part 2
Metadata lecture 5 part 2
Richard.Sapon-White
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
Richard.Sapon-White
 
Rda class, lecture 2
Rda class, lecture 2Rda class, lecture 2
Rda class, lecture 2
Richard.Sapon-White
 
Introduction to metadata, part 2
Introduction to metadata, part 2Introduction to metadata, part 2
Introduction to metadata, part 2
Richard.Sapon-White
 
Course syllabus metadata systems for warsaw
Course syllabus metadata systems for warsawCourse syllabus metadata systems for warsaw
Course syllabus metadata systems for warsaw
Richard.Sapon-White
 
Rda seminar syllabus
Rda seminar syllabusRda seminar syllabus
Rda seminar syllabus
Richard.Sapon-White
 
Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]
Richard.Sapon-White
 
Rda class, lecture 1
Rda class, lecture 1Rda class, lecture 1
Rda class, lecture 1
Richard.Sapon-White
 
Metadata lecture 1, intro
Metadata lecture 1, introMetadata lecture 1, intro
Metadata lecture 1, intro
Richard.Sapon-White
 
E books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usaE books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usa
Richard.Sapon-White
 
Accessibility issues with ebooks
Accessibility issues with ebooksAccessibility issues with ebooks
Accessibility issues with ebooks
Richard.Sapon-White
 
E books in university libraries
E books in university librariesE books in university libraries
E books in university libraries
Richard.Sapon-White
 
Subject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority controlSubject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority control
Richard.Sapon-White
 
Subject analysis, library of congress classification, part 2
Subject analysis, library of congress classification, part 2Subject analysis, library of congress classification, part 2
Subject analysis, library of congress classification, part 2
Richard.Sapon-White
 

More from Richard.Sapon-White (20)

Rda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalecRda and new research potentials, agata kawalec
Rda and new research potentials, agata kawalec
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
 
Continuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and ManagementContinuing Education for Metadata Creation and Management
Continuing Education for Metadata Creation and Management
 
Sgml and xml
Sgml and xmlSgml and xml
Sgml and xml
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Metadata and the web
Metadata and the webMetadata and the web
Metadata and the web
 
Metadata lecture 5 part 2
Metadata lecture 5 part 2Metadata lecture 5 part 2
Metadata lecture 5 part 2
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
Rda class, lecture 2
Rda class, lecture 2Rda class, lecture 2
Rda class, lecture 2
 
Introduction to metadata, part 2
Introduction to metadata, part 2Introduction to metadata, part 2
Introduction to metadata, part 2
 
Course syllabus metadata systems for warsaw
Course syllabus metadata systems for warsawCourse syllabus metadata systems for warsaw
Course syllabus metadata systems for warsaw
 
Rda seminar syllabus
Rda seminar syllabusRda seminar syllabus
Rda seminar syllabus
 
Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]Preparing your presentation.pptx [repaired]
Preparing your presentation.pptx [repaired]
 
Rda class, lecture 1
Rda class, lecture 1Rda class, lecture 1
Rda class, lecture 1
 
Metadata lecture 1, intro
Metadata lecture 1, introMetadata lecture 1, intro
Metadata lecture 1, intro
 
E books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usaE books in public libraries. vendors in poland and usa
E books in public libraries. vendors in poland and usa
 
Accessibility issues with ebooks
Accessibility issues with ebooksAccessibility issues with ebooks
Accessibility issues with ebooks
 
E books in university libraries
E books in university librariesE books in university libraries
E books in university libraries
 
Subject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority controlSubject analysis, shelflisting, filing rules, subject authority control
Subject analysis, shelflisting, filing rules, subject authority control
 
Subject analysis, library of congress classification, part 2
Subject analysis, library of congress classification, part 2Subject analysis, library of congress classification, part 2
Subject analysis, library of congress classification, part 2
 

Recently uploaded

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
christianmathematics
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 

Recently uploaded (20)

PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 

Metadata april 8 2013

  • 1. Open Archives Initiative -Protocol for Metadata Harvesting April 8, 2013 Richard Sapon-White 1
  • 2. Overview  Definitions  History  The OAI Model  Protocol for Metadata Harvesting 2
  • 3. Definitions  Harvester - client application issuing OAI-PMH requests  Harvesting - the gathering together of metadata from a number of distributed repositories into a combined data store  Archives – synonym for a repository of scholarly papers  Protocol - a set of rules defining communication between systems (such as ftp or http) 3
  • 4. History of the OAI  E-print servers = archives or repositories  E-print servers provide access to scientific and technical papers, scholarly journal articles  Authors deposit pre-prints or published articles in these repositories  Concept: public, free access to scholarly information without paid subscription to journals 4
  • 5. History of the OAI (cont.)  Why?  Scholarly research belongs to people  Speeds the sharing of research  Better for authors and readers  Known as the “open archives movement”  Has nothing to do with physical archives (repositories of institutional history or collections of unpublished materials) 5
  • 6. History of the OAI (cont.)  Many e-print servers grew  Overlapping disciplinary coverage  Overlapping geographic coverage  Developing need to  search multiple repositories simultaneously (=federated searching)  automatically identify and copy papers from other repositories (=repository synchronization) 6
  • 7. History of the OAI (cont.)  Meeting of experts, 1999, Santa Fe, New Mexico, USA  Defined an interface so that repositories could expose metadata for papers they held  Metadata could then be discovered by federated search services and other repositories and copied  Known as the Santa Fe Convention (later developed into PMH – Protocol for Metadata Harvesting 7
  • 8. The Open Archives Model  Similar concept to union catalog  Metadata “harvested” and stored in central repository  “Pull” rather than “push” model  Collecting is similar to Internet spider collecting HTML content 8
  • 9. PMH and Z39.50  Differs from Z39.50 (specifically rejected at Santa Fe)  Z39.50:  allows a client to search a remote information server across a network  Difficult to perform high-quality federated searches across many servers – would need to deal with each server individually  Complex protocol 9
  • 10. PHM and Z39.50 (cont.)  PHM is a simple protocol  User interacts with database of harvested metadata, not with individual repositories  Database is constructed by the federated search service using PHM  Therefore, performance depends only on the federated search service, not the individual repositories 10
  • 11. Metadata Harvesting Protocol  Queries and responses carried over http  Harvester application can request a single metadata record or group of records to be exported  Application can restrict records by date to only gather new records (since previous harvesting) 11
  • 12. Metadata Harvesting Protocol (cont.)  OAI-compliant data providers are capable of responding to such requests  Data provider must be able to export metadata in at least DC (unqualified) using XML communication syntax  Data provider includes URI with metadata 12
  • 13. Metadata Harvesting Protocol (cont.)  Servers can also provide metadata in other schemes beside DC  Harvester applications can request metadata in other schemes beside DC  Harvester applications can also query a metadata repository for:  List of metadata formats supported by repository  List of record sets supported by the repository  List of the identifiers of all records within the repository 13
  • 14. Why the OAI PHM is important  Provides for a minimal level of interoperability  Drives development of community-specific metadata schemes  Potential for new modes of scholarly communication  Dependent on widespread implementation by research organizations, publishers, and “memory organizations” (i.e., libraries, museums, archives) 14
  • 16. Problems with Metadata Harvesting  Loss of data when mapping unqualified DC  Incorrect data from improper mapping  Inconsistent punctuation and formatting because of diverse sources of metadata  High variance in data between institutions 16
  • 17. Metasearching  Many systems = many metadata standards  Convert to single system (harvesting)?  Maintain individual element sets BUT create interface to search simultaneously across heterogeneous databases  Voila: Metasearching!  Not a single method 17
  • 18. Definition  From NISO MetaSearch Initiative: “search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at one time.”  Best known: Z39.50 protocol. Used to search remote library catalogs. 18
  • 19. Z39.50  Allows computers to communicate to retrieve information – between client and server  Searches and results are restricted to Z39.50 databases 19
  • 20. Z39.50 results  Server may interpret the query incorrectly  Some automatically add Boolean “and” while others add Boolean “or”  Vocabulary issues – different vocabulary in different databases  Display results in order retrieved, by database found, by data, by relevance 20
  • 21. Problems with Z39.50  High recall, little precision  Also present in Google Search: few studies on user satisfaction  Results may display in an irrelevant order for the searcher 21
  • 22. Metasearching: pros and cons  Single database searching allows users to use specialized indexing or controlled vocabulary  Single portal:  No need for searcher to select a particular database from list of databases 22
  • 23. Case Studies  Divide into 3-4 groups  Read the case study  Discuss and report:  Describe the case briefly (2 min.)  What can we learn from this case study? (3 min.) 23

Editor's Notes

  1. No coverage of technical details – beyond me. Do want to cover concepts, definitions so that if someone talks to you about these things, you will understand