Metadata harvesting


Published on

Published in: Education, Technology
  • Can I have simple example about the meaning of metadata harvesting?
    Are you sure you want to  Yes  No
    Your message goes here

Metadata harvesting

  1. 1. Metadata Harvesting and the OAI-PMH<br />Andrew Schenck<br />Pamela Russell<br />LIS 688<br />
  2. 2. What is Metadata Harvesting?<br />An automatic metadata generating method<br />Occurs when metadata is automatically collected from META tags <br />Automatically gathers metadata from individual repositories<br />
  3. 3. Example Metadata Generators<br />Metadata generators are also known as metadata extraction systems<br />Sample metadata extraction systems available for libraries include:<br />DC-dot<br />MarcEdit<br />Metaextract<br />IBM Magic System<br />Some are available via open source<br />
  4. 4. DC-dot<br />DC-dot is open source and it can be redistributed or modified<br />DC-dot creates Dublin Core metadata<br />Metadata creation is initiated by submitting a URL<br />Generates keywords by analyzing hyperlinked concepts and presentation encoding<br />Does not produce description metadata<br />Generates type, format and date metadata <br />
  5. 5. MarcEdit<br />MarcEdit is open source<br />MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool.<br />An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. <br />It allows users to:<br />Customize the existing data conversion rules or create new data conversion rules<br />Harvest metadata from a supported metadata format<br />Create conversion templates for additional metadata formats<br />Customize existing conversion templates to reflect many variations in best practices used among projects<br />
  6. 6. Metaextract<br />Designed for metadata extraction in the domain of math and science education for K-12<br />Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels <br />Collection-level metadata is generated based on a collection-specific configuration<br />Item-level metadata is extracted from the content of educational documents using three extraction modules:<br />eQuery<br />HTML-based modules<br />Keyword generator module <br />
  7. 7. IBM Magic System<br />Includes various content analytic modules for metadata generation:<br />Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents<br />Facilitates content reuse and repurposing<br />Improves interoperability<br />Creates more timely registration of content<br />
  8. 8. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)<br />Released in June 2002<br />Provides an application-independent interoperability framework based on metadata harvesting<br />Two levels of participants in the OAI-PMH:<br />Data providers: Administer the systems<br />Service providers: Use the metadata harvested to build their digital collection<br />
  9. 9. OAI-PMH Key terms<br />Harvester<br />Operated by a service provider as a way to collect metadata from a repository<br />Repository<br />A network accessible server that is able to process OAI-PMH requests<br />Managed by the data provider to allow harvesters access to its metadata <br />
  10. 10. Harvesting Problems<br />Lack of consistency<br />Different collections using different DC elements and controlled vocabularies<br />Repositories may have missing data within their metadata<br />The repository may decline to fill out elements<br />Incorrect data<br />Data in the wrong element<br />Harvested metadata can be confusing<br />Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons<br />Insufficient data<br />
  11. 11. Recommendations for Improving Harvesting<br />Establish guidelines and best practices<br />Develop local standards<br />Evaluate metadata<br />Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment.<br />Check to see if any fields are populated with unknown or N/A<br />Communicate with the service provider<br />
  12. 12. Conclusion<br />Evidence suggests that OAI-PMH is a successful endeavor<br />Increase in number of repositories<br />Many funded projects based on OAI<br /> <br />Metadata Harvesting Initiative of the Mellon Foundation<br />NSF National Science Digital Library (NSDL)<br />The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting<br />