Haley ChildersLIS 688-04April 26, 2012Professor Oguz 1
Automatic Metadata Generation Is a machine process of metadata extraction and metadata harvesting. Metadata extraction uses automatic indexing techniques to search and obtain resource content and produce structured metadata according to metadata standards Metadata harvesting is completed by machine to collect tagged metadata created by machine or humans. 2
Automatic Metadata Generation Concept Example(s)Metadata extraction. The process of automatically Metadata extraction for a Web page involves extractingpulling (extracting) metadata from a resource’s metadata from the resources content that is displayedcontent. Resource content is mined to produce via a Web browser.structured (“labeled”) metadata for objectrepresentation.Metadata harvesting. The process of automatically Metadata harvested from a Web page is found in thecollecting resource metadata already embedded in or "header” source code of an HTML (or XHTML) resourceassociated with a resource. The harvested metadata is (e.g., "Keywords" META tags). Metadata for a Microsoftoriginally produced by humans or by fully or WORD document is found under file propertiessemiautomatic processes supported by software. (e.g., "Type of file," which is automatically generated, and "Keywords," which can be added by a resource author).Fully-automatic metadata generation. Web editing software (e.g., Macromedia’s DreamweaverComplete (or total) reliance on automatic processes to and Microsoft’s FrontPage) and selected documentcreate metadata. software (e.g., Microsoft WORD and Acrobat) automatically produce metadata at the time a resource is created or updated (e.g., “Date of creation" or "Date modified") without human intervention.Semi-automatic metadata generation. (1) Fully-automatic techniques are used to generatePartial reliance on software to create metadata; a metadata (e.g.,"Keywords") as a first pass, andcombination of fully-automatic and human processes software then presents the metadata to a person, whoto create metadata. may manually edit the metadata. (2)Software may present a person (e.g., resource author or Web architect) with a “template” that guides the manual input of metadata, and then automatically converts the metadata to appropriate encoding (e.g., XML tags). The software may even automatically embed metadata in a resource. 4 Greenberg (2005), p. 25
Created to “identify and recommend functionalities for automatic metadata generation applications” Discusses current state of automatic metadata generation applications Problem areas Conducted survey of metadata experts Suggests functionalities that future applications should incorporateFound at: http://www.loc.gov/catdir/bibcontrol/lc_amega_final_report.pdf 5
Problems with current automatic metadata applications: Do not support standard bibliographic functions and element qualifications Sophisticated automatic indexing algorithms have not been incorporated to metadata applications Automatic metadata applications are developed separate from each other There is no standards for creating automatic metadata generation applications 6
The purpose of the survey conducted by AMeGA was to: Get an idea of what current libraries are currently doing for metadata creation See if they are aware of current automatic metadata generation applications See what developments they would most like to see happen for metadata creation Survey participants: 217 completed the survey75.2% of participants had three or more years of cataloging and/or indexing experience •29.5% were administrators/executives •40.7% of participants were from Academic •28.3% catalogers/metadata librarians libraries •Remaining percentages divided by 8 •13.4% from Government categories agency/department •12.8% Academic community (not the library) •11.6% Government library •9.3% Non-profit organization •8.1% Cooperation/company •1.2% Public library •0.1% Corporate library 7 •2.3% Other
Top 4 metadata standards used in the libraries that participants worked: MARC, DC simple, DC qualified, and EAD. Top 4 metadata standards used in nonlibraries that participants worked: DC simple, DC qualified, MARC, DC application profile. 94 Organizations were using 1 metadata system 55 Organizations were using 2 metadata systems 22 Organizations were using 3 metadata systems 6 Organizations were using 4 metadata systems 4 Organizations were using 5 metadata systems 2 Organizations were using 6 metadata systems 1 Organization was using 7 metadata systems The most common Metadata Generation systems being used (in order of most used): Custom/in-house ContentDM Endeavor/Voyager OCLC/Innovative Interfaces OCLC/Connexion Microsoft Access Xmetal NoteTab (or similar text editor) XML Spy Dspace (etc.) Greenberg (2005) p. 24 8
Survey participants were asked a series of experience or opinion questions regarding the automatic metadata generation of digital document like objects using the Dublin Core Metadata Element Set. Participants either experience or predict the most accuracy of technical metadata (ID, language, format). Less accuracy was predicted for subject and description since it requires intellectual judgment. When questioned whether they would devote a “moderate” amount of resources for research between either intellectual metadata (subject, description) or complete automation of physical metadata (ID, format, language) they were divided. A majority of participants believed that research for generating nontextual and foreign language material is important and valuable. 70% of participants would like applications to run automatic algorithms, allowing human evaluation and editing afterwards. Most participants would also want to be able to incorporate subject schemes, content creation guidelines, cataloging and metadata examples into metadata generation applications. 9
Based on the results of the survey, AMeGA created a list of functionalities needed in automatic metadata generation applications: The system should be able to configure profiles before metadata generation The system should automatically identify and collect any metadata associated with a resource The system should enhance and refine manually generated and automatically generated metadata The system should automatically evaluate the quality and metadata and provide a rating score The system should be used to create metadata for nontextual resources 10
Conclusion Experimental researchers and metadata experts need to work together on developing applications. Application standards needs to be created. Much more funding and research needs to be devoted to automatic metadata generation. The important thing to now be developed is metadata generation applications that automatically identifies and collects metadata, aids human metadata generation, enhance previously created metadata, and evaluates the quality of metadata. 11
DCMI (2008). Dublin Core Metadata Initiative: Scorpion. Retrieved from http://www.dublincore.org/tools/tools/tool-11.shtmlGreenberg, J., (2003). Metadata Generation: Processes, People and Tools. Bulletin of the American Society for Information Sciences and Technology, Volume Number 29(2). Retrieved from http://www.asis.org/Bulletin/Dec-02/greenberg.htmlGreenberg, J., Spurgin, K., Crystal, A. (2005). Final Report for the AMeGA (Automatic Metadata Generation Applications) Project. Retrieved from http://www.loc.gov/catdir/bibcontrol/lc_amega_final_report.pdfGreenberg, J., Spurgin, K., Crystal, A. (2006). Functionalities for automatic metadata generation applications: a survey of metadata experts’ opinions. Int. J. Metadata, Semantics and Ontologies, Volume Number 1 (1), 3-20.Ojokoh, B., Adewale, O., & Falaki, S. (2009). Automated document metadata extraction. Journal Of Information Science, 35(5), 563-570.Park, J., & Lu, C. (2009). Application of semi-automatic metadata generation in libraries: Types, tools, and techniques. Library & Information Science Research (07408188), 31(4), 225-231.Shafer, K. E. (2001). Automatic Subject Assignment via the Scorpion System. Journal Of Library Administration, 34(1/2), 187.Shafer, K. E. (2001). Evaluating Scorpion Results. Journal Of Library Administration, 34(3/4), 237.Su, S. T., Long, Y., & Cromwell, D. E. (2002). E2M: Automatic Generation of MARC-Formatted Metadata by Crawling E-Publications. Information Technology & Libraries, 21(4), 171-180. 12
Thank you! For any questions or concerns, please contact me at: email@example.com _________It’s been a wonderful class with everyone! Good luck in all of your future endeavors! I hope to see you all around! 13
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.