Bibliographic metadata (including citation)

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Bibliographic metadata (including citation) - Presentation Transcript

    1. Bibliographic metadata (including citation) Tuesday 7 th July 2009 AMG 2 nd workshop, University of Leicester , Leicester www.bath.ac.uk UKOLN is supported by: Alexey Strelnikov Research Officer UKOLN Contributions from Emma Tonkin
    2. Agenda
      • Introduction
      • What and why
      • Use cases
      • Key points
      • Issues
      • Recommendations
    3. Introduction
      • Metadata extraction is the process of describing extrinsic and intrinsic qualities of a resource
    4. Bibliographic metadata
      • Bibliographic metadata is a particular case of metadata extraction.
      • For example:
      • Title
      • Authors
      • Emails
      • Citations
    5. What and why
      • General metadata extraction – tends to involve machine learning
      • Citation and reference analysis – usually involves regular expressions
      • Might involve visual structure analysis and text mining
    6. What and why (2)
      • In order to improve long/boring manual operations with metadata:
        • Generation metadata on document deposit
        • Revision of metadata
        • Comparison and aggregation
        • <Put your own operation here>
    7. What and why (3)
      • Automatic extraction can make a system more robust (in addition to existing approaches)
      • It is not a drop-in replacement for manual creation, but semi-automated feature extraction can make for better metadata quality overall
    8. Use case (1)
      • Dominik – is a researcher, publishing his new paper
      • Instead of fully manual deposit (typing in all values) he makes use of system suggestions, which make the process faster and simpler
    9. Use case (2)
      • Fiona – is a researcher, assessing impact made by her paper
      • How many citations of my work?
      • Network of citations (existing system: Google scholar, citeseer.net...)
    10. Use case (3)
      • Bob – is a repository manager, checking inconsistency in the repository's metadata
      • Make use of system recommendations, and a generated value confidence level
      • Easier to find invalid or obsolete metadata values
    11. Use case (4)
      • Edward – is an application profile/standard curator, checking inter-repository metadata
      • Have application profile, but no feedback on how it is followed
      • Consistent errors:
        • Not filled
        • Systematically wrong value (might be related to research field, environment)
      • Comparison & aggregation report
    12. Summary for use cases
      • All approaches have a manual analogue
      • Automated metadata extraction would be an improvement, but not replacement
      • Service is invisible , it just makes suggestions: for example – 'the metadata field “title” should be “Some name”'
    13. Key points
      • Standards - involved in the workflow make a big impact
        • “The nice thing about standards is that there are so many of them to choose from” Andrew S. Tanenbaum
      • Tools – existing applications to extract metadata
    14. Standards
      • Should consider a number of standards for representation, format, as well as languages and locales
        • Document encoding
        • Metadata encoding
        • Locale specifics
        • Citation formats
        • Document encoding
      • Important because this may impact correct reading of a resource
      • Document formats:
        • PDF, Doc, PPT, etc.
      • Font encoding:
        • UTF, locale specific
        • Metadata encoding
      • This has a direct impact on the result's usability in a given context
      • Examples of metadata standards:
        • OAI-DC
        • SWAP
        • LOM
        • OAI-ORE
        • MARC
        • Locale specifics
      • Country and culture specific formats of text elements
      • For example:
        • Right-to-left languages
        • Date format:
          • dd/mm/yyyy
          • mm/dd/yyyy
        • Citation and reference formats
      • There exist many citation/reference formats, different standards exist for most research fields
      • For example:
        • APA – social sciences
        • MLA – literature and the arts
        • AMA - biology
        • Turabian – multi-field
        • Chicago standard – publications
        • Harvard, Numerical, MHRA - multi-field
    15. Tools
      • Automated metadata extraction is a workflow, which involves several interconnected software systems
      • Helps to overcome standards heterogeneity
    16. Examples of Tools
      • Examples of existing tools:
        • DC-dot (variety of doc/web formats -> DC metadata)
        • DepositPlait (var. format metadata -> metadata repository)
        • DataFountains (var. format->metadata)
        • paperBase (prototype concentrating on eprint documents)
    17. Issues
      • Full-text resource availability
      • Readability of the text
      • Legal issues
      • Engineering constraints - machine suggestions might be imperfect
      • Language & localization - need to retrain system for the other locale
    18. Recommendations
      • A robust system that is easy to retrain, customizable input & outputs plugins
        • A potential gain:
          • Simplify (re)extraction of metadata, faster repository operations, validation
      • Making use of confidence level assigned to the metadata field
        • A potential gain:
          • Identifying possibly incorrect metadata records
    19. Recommendations (2)
      • Make full-text document available to the system
        • A potential gain:
          • Periodical re-exploration of the resource and updating the metadata
      • Investigate the problem of analysing citation
        • A potential gain:
          • Assess level of similarity between papers
          • Classify paper nature
    20. Q&A
      • Thank you for your attention
    SlideShare Zeitgeist 2009

    + UKOLN (dev), University of BathUKOLN (dev), University of Bath Nominate

    custom

    94 views, 0 favs, 0 embeds more stats

    A talk were given at automatic metadata extraction more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 94
      • 94 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 2
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories