Your SlideShare is downloading. ×
0
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bibliographic metadata (including citation)

1,212

Published on

A talk were given at automatic metadata extraction workshop by Intrallect and Jisc. This particular talk is about bibliographical metadata extraction in context of automated extraction.

A talk were given at automatic metadata extraction workshop by Intrallect and Jisc. This particular talk is about bibliographical metadata extraction in context of automated extraction.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,212
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Bibliographic metadata (including citation) Tuesday 7 th July 2009 AMG 2 nd workshop, University of Leicester , Leicester www.bath.ac.uk UKOLN is supported by: Alexey Strelnikov Research Officer UKOLN Contributions from Emma Tonkin
  • 2. Agenda <ul><li>Introduction
  • 3. What and why
  • 4. Use cases
  • 5. Key points
  • 6. Issues
  • 7. Recommendations </li></ul>
  • 8. Introduction <ul><li>Metadata extraction is the process of describing extrinsic and intrinsic qualities of a resource </li></ul>
  • 9. Bibliographic metadata <ul><li>Bibliographic metadata is a particular case of metadata extraction.
  • 10. For example:
  • 11. Title
  • 12. Authors
  • 13. Emails
  • 14. Citations </li></ul>
  • 15. What and why <ul><li>General metadata extraction – tends to involve machine learning
  • 16. Citation and reference analysis – usually involves regular expressions
  • 17. Might involve visual structure analysis and text mining </li></ul>
  • 18. What and why (2) <ul><li>In order to improve long/boring manual operations with metadata: </li><ul><li>Generation metadata on document deposit
  • 19. Revision of metadata
  • 20. Comparison and aggregation
  • 21. <Put your own operation here> </li></ul></ul>
  • 22. What and why (3) <ul><li>Automatic extraction can make a system more robust (in addition to existing approaches)
  • 23. It is not a drop-in replacement for manual creation, but semi-automated feature extraction can make for better metadata quality overall </li></ul>
  • 24. Use case (1) <ul><li>Dominik – is a researcher, publishing his new paper
  • 25. Instead of fully manual deposit (typing in all values) he makes use of system suggestions, which make the process faster and simpler </li></ul>
  • 26. Use case (2) <ul><li>Fiona – is a researcher, assessing impact made by her paper
  • 27. How many citations of my work?
  • 28. Network of citations (existing system: Google scholar, citeseer.net...) </li></ul>
  • 29. Use case (3) <ul><li>Bob – is a repository manager, checking inconsistency in the repository's metadata
  • 30. Make use of system recommendations, and a generated value confidence level
  • 31. Easier to find invalid or obsolete metadata values </li></ul>
  • 32. Use case (4) <ul><li>Edward – is an application profile/standard curator, checking inter-repository metadata
  • 33. Have application profile, but no feedback on how it is followed
  • 34. Consistent errors: </li><ul><li>Not filled
  • 35. Systematically wrong value (might be related to research field, environment) </li></ul><li>Comparison & aggregation report </li></ul>
  • 36. Summary for use cases <ul><li>All approaches have a manual analogue
  • 37. Automated metadata extraction would be an improvement, but not replacement
  • 38. Service is invisible , it just makes suggestions: for example – 'the metadata field “title” should be “Some name”' </li></ul>
  • 39. Key points <ul><li>Standards - involved in the workflow make a big impact </li><ul><li>“The nice thing about standards is that there are so many of them to choose from” Andrew S. Tanenbaum </li></ul><li>Tools – existing applications to extract metadata </li></ul>
  • 40. Standards <ul><li>Should consider a number of standards for representation, format, as well as languages and locales </li></ul><ul><ul><li>Document encoding
  • 41. Metadata encoding
  • 42. Locale specifics
  • 43. Citation formats </li></ul></ul>
  • 44. <ul><ul><li>Document encoding </li></ul></ul><ul><li>Important because this may impact correct reading of a resource
  • 45. Document formats: </li><ul><li>PDF, Doc, PPT, etc. </li></ul><li>Font encoding: </li><ul><li>UTF, locale specific </li></ul></ul>
  • 46. <ul><ul><li>Metadata encoding </li></ul></ul><ul><li>This has a direct impact on the result's usability in a given context
  • 47. Examples of metadata standards: </li><ul><li>OAI-DC
  • 48. SWAP
  • 49. LOM
  • 50. OAI-ORE
  • 51. MARC </li></ul></ul>
  • 52. <ul><ul><li>Locale specifics </li></ul></ul><ul><li>Country and culture specific formats of text elements
  • 53. For example: </li><ul><li>Right-to-left languages
  • 54. Date format: </li><ul><li>dd/mm/yyyy
  • 55. mm/dd/yyyy </li></ul></ul></ul>
  • 56. <ul><ul><li>Citation and reference formats </li></ul></ul><ul><li>There exist many citation/reference formats, different standards exist for most research fields
  • 57. For example: </li><ul><li>APA – social sciences
  • 58. MLA – literature and the arts
  • 59. AMA - biology
  • 60. Turabian – multi-field
  • 61. Chicago standard – publications
  • 62. Harvard, Numerical, MHRA - multi-field </li></ul></ul>
  • 63. Tools <ul><li>Automated metadata extraction is a workflow, which involves several interconnected software systems
  • 64. Helps to overcome standards heterogeneity </li></ul>
  • 65. Examples of Tools <ul><li>Examples of existing tools: </li><ul><li>DC-dot (variety of doc/web formats -> DC metadata)
  • 66. DepositPlait (var. format metadata -> metadata repository)
  • 67. DataFountains (var. format->metadata)
  • 68. paperBase (prototype concentrating on eprint documents) </li></ul></ul>
  • 69. Issues <ul><li>Full-text resource availability
  • 70. Readability of the text
  • 71. Legal issues
  • 72. Engineering constraints - machine suggestions might be imperfect
  • 73. Language & localization - need to retrain system for the other locale </li></ul>
  • 74. Recommendations <ul><li>A robust system that is easy to retrain, customizable input & outputs plugins </li><ul><li>A potential gain: </li><ul><li>Simplify (re)extraction of metadata, faster repository operations, validation </li></ul></ul><li>Making use of confidence level assigned to the metadata field </li><ul><li>A potential gain: </li><ul><li>Identifying possibly incorrect metadata records </li></ul></ul></ul>
  • 75. Recommendations (2) <ul><li>Make full-text document available to the system </li><ul><li>A potential gain: </li><ul><li>Periodical re-exploration of the resource and updating the metadata </li></ul></ul><li>Investigate the problem of analysing citation </li><ul><li>A potential gain: </li><ul><li>Assess level of similarity between papers
  • 76. Classify paper nature </li></ul></ul></ul>
  • 77. Q&A <ul><li>Thank you for your attention </li></ul>

×