Metadata harvesting

4,548 views

Published on

Published in: Education, Technology
  • Can I have simple example about the meaning of metadata harvesting?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Metadata harvesting

  1. 1. Metadata Harvesting and the OAI-PMH<br />Andrew Schenck<br />Pamela Russell<br />LIS 688<br />
  2. 2. What is Metadata Harvesting?<br />An automatic metadata generating method<br />Occurs when metadata is automatically collected from META tags <br />Automatically gathers metadata from individual repositories<br />
  3. 3. Example Metadata Generators<br />Metadata generators are also known as metadata extraction systems<br />Sample metadata extraction systems available for libraries include:<br />DC-dot<br />MarcEdit<br />Metaextract<br />IBM Magic System<br />Some are available via open source<br />
  4. 4. DC-dot<br />DC-dot is open source and it can be redistributed or modified<br />DC-dot creates Dublin Core metadata<br />Metadata creation is initiated by submitting a URL<br />Generates keywords by analyzing hyperlinked concepts and presentation encoding<br />Does not produce description metadata<br />Generates type, format and date metadata <br />
  5. 5. MarcEdit<br />MarcEdit is open source<br />MarcEdit was initially conceived as a graphical user interface designed as a batch MARC editing tool.<br />An application suite of metadata editing tools that includes character set conversion, XML crosswalking, and metadata harvesting. <br />It allows users to:<br />Customize the existing data conversion rules or create new data conversion rules<br />Harvest metadata from a supported metadata format<br />Create conversion templates for additional metadata formats<br />Customize existing conversion templates to reflect many variations in best practices used among projects<br />
  6. 6. Metaextract<br />Designed for metadata extraction in the domain of math and science education for K-12<br />Also designed to extract Dublin Core and Gateway to Educational Materials metadata on both the item and collection levels <br />Collection-level metadata is generated based on a collection-specific configuration<br />Item-level metadata is extracted from the content of educational documents using three extraction modules:<br />eQuery<br />HTML-based modules<br />Keyword generator module <br />
  7. 7. IBM Magic System<br />Includes various content analytic modules for metadata generation:<br />Audiovisual analysis modules – recognizes semantic sound categories as well as text analysis modules that extract title, keywords, and summary from text documents<br />Facilitates content reuse and repurposing<br />Improves interoperability<br />Creates more timely registration of content<br />
  8. 8. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)<br />Released in June 2002<br />Provides an application-independent interoperability framework based on metadata harvesting<br />Two levels of participants in the OAI-PMH:<br />Data providers: Administer the systems<br />Service providers: Use the metadata harvested to build their digital collection<br />
  9. 9. OAI-PMH Key terms<br />Harvester<br />Operated by a service provider as a way to collect metadata from a repository<br />Repository<br />A network accessible server that is able to process OAI-PMH requests<br />Managed by the data provider to allow harvesters access to its metadata <br />
  10. 10. Harvesting Problems<br />Lack of consistency<br />Different collections using different DC elements and controlled vocabularies<br />Repositories may have missing data within their metadata<br />The repository may decline to fill out elements<br />Incorrect data<br />Data in the wrong element<br />Harvested metadata can be confusing<br />Strings of names can be ordered in an inconsistent manner or ambiguously separated with commas instead of semicolons<br />Insufficient data<br />
  11. 11. Recommendations for Improving Harvesting<br />Establish guidelines and best practices<br />Develop local standards<br />Evaluate metadata<br />Check to see if there are certain elements where you have local metadata that would not be useful in an aggregated environment.<br />Check to see if any fields are populated with unknown or N/A<br />Communicate with the service provider<br />
  12. 12. Conclusion<br />Evidence suggests that OAI-PMH is a successful endeavor<br />Increase in number of repositories<br />Many funded projects based on OAI<br />eprints.org <br />Metadata Harvesting Initiative of the Mellon Foundation<br />NSF National Science Digital Library (NSDL)<br />The importance of metadata is one of the reasons that the Open Archives Initiative created the Protocol for Metadata Harvesting<br />

×