Devi Ahilya Vishwavidyalaya, Indore (M.P.)
School of Library and Information Science, Indore

Session-2015-16
Metadata Harvesting tools
Submitted To:- Submitted by :-
Dr. GHS Naidu
Umrav Singh
HOD SLIS, Indore MPhil Library and Information Sc.
Contents
 Introduction to Metadata
 Definition of Metadata
 What is Meta Data harvesting
 Meta Harvesting Process
 Need of Meta Data
 Types of Meta Data
Introduction to Metadata
 Metadata can be defined as "data about data" describe the content,
quality, condition, and other characteristics of data. Metadata is vital in
helping potential users to find needed data and determine whether a data
set will meet their needs before they spend the time and money to obtain
and process it.
Example of Metadata
Element name Value
Title: Web catalogue
Creator: Dagnija McAuliffe
Publisher: University of Queensland Library
Format: Text/htm
Definition of Metadata
 “ Data that serves to provide context or additional information about
other data. for example, information about the title, subject , author,
typeface, enhancements, and size of the data file of a documents
constitute metadata about that document. It may also describe the
conditions under which the data stored in a database was acquired, its
accuracy, data, time, method of compilation and processing, etc.”
According to : http://www.businessdictionary.com/definition/metadata.html
Need of Metadata
Metadata is a systematic method for describing resources
and thereby improving access to them.
The primary aim of metadata is to improve resources
discovery. Resource documentation
 Resource selection, evaluation and assessment
 Resource identification and location
 Improving the quality and quantity of search result
 Electronic commerce to encode prices, term of pay, etc.
 Protecting instinctual property rights
Types of Meta Data
 Administrative Meta Data
 Descriptive Meta Data
 Structural Meta Data
 Preservation Meta Data
 Right Management Meta Data
What is Metadata Harvesting ?
 Harvesting: In the Open Archives Initiative context, harvesting refers
specifically to the gathering together of metadata from a number of
distributed repositories into a combined data store.
The Web
An Aggregation and the web
Process of data Harvesting
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Need of Meta Data Harvesting
 Single Platform for resource Discovery
 Easy Sharing of Resources Between Libraries/ Digital Libraries
 Archiving data
 Preservation
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Interoperability Requirements
Meta Data Standard (Like Dublin Core
Meta Data Elements Set)
Open Archives Initiatives – Protocol of
Metadata Harvesting (OAI)
Data Provider
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Cont……
Data Providers (DPs)
 Provide free Access to Meta Data.
Service Providers (SPS)
 Use the OAI Interfaces of the Data providers to harvest and store meta data.
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Example of Meta Data Harvesting
 Click to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Metadata Harvesting Services in India
1. NAME: Search Digital Libraries (SDL)
URL: http://drtc.isibang.ac.in/sdl
Host: DRTC Bangalore
Software Used: PKP (Public Knowledge Project)
2. Name: Knowledge Harvester@INSA (Indian National Science Academy)
URL: http://61.16.154.195/harvester/
Host: INSA
Software Used: PKP (Public Knowledge Project)
Open Access Initiative for Metadata Protocol Harvesting
Open Access works are scattered across many disciplinary archives, institutional e-print archives,
institutional repositories and open access journals. Therefore, it is difficult for users to locate all
needed works on a particular subject.
Metadata Harvesting Services in India
Cont…..
3. NAME: Open J-Gate
URL: www.openj-gate.com
Host: Informatics (India) Ltd.
4. Name: SEED (Search Engine for Engineering Digital-Repositories)
URL: http://eprint.iitd.ac.in/seed/
Host: IIT, Delhi
Software Used: PKP (Public Knowledge Project)
Metadata Schema
The Format or Schema of Metadata
may be vary in different
organizations according to their
requirements.
Each metadata schema will usually
Meta Data harvesting Tools
 Arc (http://arc.cs.odu.edu)
 Citibase (http://citebase.eprints.org/cgi-bin/search)
 CYCLADES (http://www.ercim.org/cyclader/)
 Repox (http://repox.ist.utl.ptlindex.html/)
 OAICAT (http://www..oclc.org/research/software/oai/cat/)
 OAI Repository Explorer(http://re.cs.uct.anza/)
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Library related standards: Considerations
 Metadata Standards
 Some of the metadata standards available are MARC, MARC21, Dublin Core,
UK MARC (now transformed to marc21), etc. MARC21 is the latest standards
in term of metadata. The first level metadata elements of MARC are:
 Leader and Directory
 Control Fields 001-008
 Number and Code Fields (01X-04X)
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Library related standards: Considerations
(cont’d.)
 Main Entry Fields (1XX)
 Title and Title-Related Fields (20X-24X)
 Edition, Imprint, etc.Fields (250-270)
 Physical Description, etc. Fields (3XX)
 Series Statement Fields (4XX)
 Subject Access Fields (6XX)
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Examples of Metadata Standards Web Sites:
LC standards include:
•MARC: Machine-Encoded Cataloging:
http://www.loc.gov/marc/
•MARCXML
http://www.loc.gov/standards/marcxml/
•MODS: Metadata Object Description Schema:
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Conclusion
 Metadata is a key part of the information infrastructure necessary to help
create order in the chaos of the Web, infusing description, classification,
and organization to help create more useful stores of information. OAI
metadata harvesting offers a new bridge to bring new innovation in
networked information services and applications, out of the research
community more rapidly
 Click to edit the outline text format
− Second Outline Level
 Third Outline Level
− Fourth Outline Level
 Fifth Outline Level
 Sixth Outline Level
 Seventh Outline LevelClick to edit Master text styles
 Second level
 Third level
 Fourth level
 Fifth level
Reference
 http://www.slideshare.net/ManasaRath/metadata-harvesting-46638140?qid=09b4ac52-cc73-
4b6e-9ce3-c0625eb6b585&v=default&b=&from_search=1
 https://www.google.co.in/?gfe_rd=cr&ei=C0ZMVd35J-LA8gfb0oCoCg&gws_rd=ssl
Thank you
Metadata harvesting Tools

Metadata harvesting Tools

  • 1.
    Devi Ahilya Vishwavidyalaya,Indore (M.P.) School of Library and Information Science, Indore Session-2015-16 Metadata Harvesting tools Submitted To:- Submitted by :- Dr. GHS Naidu Umrav Singh HOD SLIS, Indore MPhil Library and Information Sc.
  • 2.
    Contents  Introduction toMetadata  Definition of Metadata  What is Meta Data harvesting  Meta Harvesting Process  Need of Meta Data  Types of Meta Data
  • 3.
    Introduction to Metadata Metadata can be defined as "data about data" describe the content, quality, condition, and other characteristics of data. Metadata is vital in helping potential users to find needed data and determine whether a data set will meet their needs before they spend the time and money to obtain and process it.
  • 4.
    Example of Metadata Elementname Value Title: Web catalogue Creator: Dagnija McAuliffe Publisher: University of Queensland Library Format: Text/htm
  • 5.
    Definition of Metadata “ Data that serves to provide context or additional information about other data. for example, information about the title, subject , author, typeface, enhancements, and size of the data file of a documents constitute metadata about that document. It may also describe the conditions under which the data stored in a database was acquired, its accuracy, data, time, method of compilation and processing, etc.” According to : http://www.businessdictionary.com/definition/metadata.html
  • 6.
    Need of Metadata Metadatais a systematic method for describing resources and thereby improving access to them. The primary aim of metadata is to improve resources discovery. Resource documentation  Resource selection, evaluation and assessment  Resource identification and location  Improving the quality and quantity of search result  Electronic commerce to encode prices, term of pay, etc.  Protecting instinctual property rights
  • 7.
    Types of MetaData  Administrative Meta Data  Descriptive Meta Data  Structural Meta Data  Preservation Meta Data  Right Management Meta Data
  • 8.
    What is MetadataHarvesting ?  Harvesting: In the Open Archives Initiative context, harvesting refers specifically to the gathering together of metadata from a number of distributed repositories into a combined data store.
  • 9.
  • 10.
  • 12.
    Process of dataHarvesting
  • 13.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Need of Meta Data Harvesting  Single Platform for resource Discovery  Easy Sharing of Resources Between Libraries/ Digital Libraries  Archiving data  Preservation
  • 14.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Interoperability Requirements Meta Data Standard (Like Dublin Core Meta Data Elements Set) Open Archives Initiatives – Protocol of Metadata Harvesting (OAI) Data Provider
  • 15.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Cont…… Data Providers (DPs)  Provide free Access to Meta Data. Service Providers (SPS)  Use the OAI Interfaces of the Data providers to harvest and store meta data.
  • 16.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Example of Meta Data Harvesting  Click to edit Master text styles  Second level  Third level  Fourth level  Fifth level
  • 17.
    Metadata Harvesting Servicesin India 1. NAME: Search Digital Libraries (SDL) URL: http://drtc.isibang.ac.in/sdl Host: DRTC Bangalore Software Used: PKP (Public Knowledge Project) 2. Name: Knowledge Harvester@INSA (Indian National Science Academy) URL: http://61.16.154.195/harvester/ Host: INSA Software Used: PKP (Public Knowledge Project) Open Access Initiative for Metadata Protocol Harvesting Open Access works are scattered across many disciplinary archives, institutional e-print archives, institutional repositories and open access journals. Therefore, it is difficult for users to locate all needed works on a particular subject.
  • 18.
    Metadata Harvesting Servicesin India Cont….. 3. NAME: Open J-Gate URL: www.openj-gate.com Host: Informatics (India) Ltd. 4. Name: SEED (Search Engine for Engineering Digital-Repositories) URL: http://eprint.iitd.ac.in/seed/ Host: IIT, Delhi Software Used: PKP (Public Knowledge Project)
  • 19.
    Metadata Schema The Formator Schema of Metadata may be vary in different organizations according to their requirements. Each metadata schema will usually
  • 20.
    Meta Data harvestingTools  Arc (http://arc.cs.odu.edu)  Citibase (http://citebase.eprints.org/cgi-bin/search)  CYCLADES (http://www.ercim.org/cyclader/)  Repox (http://repox.ist.utl.ptlindex.html/)  OAICAT (http://www..oclc.org/research/software/oai/cat/)  OAI Repository Explorer(http://re.cs.uct.anza/)
  • 21.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Library related standards: Considerations  Metadata Standards  Some of the metadata standards available are MARC, MARC21, Dublin Core, UK MARC (now transformed to marc21), etc. MARC21 is the latest standards in term of metadata. The first level metadata elements of MARC are:  Leader and Directory  Control Fields 001-008  Number and Code Fields (01X-04X)
  • 22.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Library related standards: Considerations (cont’d.)  Main Entry Fields (1XX)  Title and Title-Related Fields (20X-24X)  Edition, Imprint, etc.Fields (250-270)  Physical Description, etc. Fields (3XX)  Series Statement Fields (4XX)  Subject Access Fields (6XX)
  • 23.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Examples of Metadata Standards Web Sites: LC standards include: •MARC: Machine-Encoded Cataloging: http://www.loc.gov/marc/ •MARCXML http://www.loc.gov/standards/marcxml/ •MODS: Metadata Object Description Schema:
  • 24.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Conclusion  Metadata is a key part of the information infrastructure necessary to help create order in the chaos of the Web, infusing description, classification, and organization to help create more useful stores of information. OAI metadata harvesting offers a new bridge to bring new innovation in networked information services and applications, out of the research community more rapidly
  • 25.
     Click toedit the outline text format − Second Outline Level  Third Outline Level − Fourth Outline Level  Fifth Outline Level  Sixth Outline Level  Seventh Outline LevelClick to edit Master text styles  Second level  Third level  Fourth level  Fifth level Reference  http://www.slideshare.net/ManasaRath/metadata-harvesting-46638140?qid=09b4ac52-cc73- 4b6e-9ce3-c0625eb6b585&v=default&b=&from_search=1  https://www.google.co.in/?gfe_rd=cr&ei=C0ZMVd35J-LA8gfb0oCoCg&gws_rd=ssl
  • 26.