Published on

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Metadata Issues for e-Prints: experiences from setting up an Institutional Repository Jessie Hey Research Fellow TARDis Project University of Southampton ePrints UK Workshop Ashmolean Museum Oxford 22 Mar 2004
  2. 2. e-Prints <ul><li>A simple illustration of diversity in metadata! </li></ul><ul><li>EPrints (software) </li></ul><ul><li>e-Prints (Soton) </li></ul><ul><li>ePrints (UK project) </li></ul><ul><li>eprints (in URLs, emails) </li></ul><ul><li>E-print (Network – US gateway) </li></ul>
  3. 3. Searching for e-Prints in Google e-Prints 1,200,000; eprints 225,000
  4. 4. Plam pilot? <ul><li>Looking for a PDA? </li></ul><ul><li>Just try searching for plam pilot on eBay </li></ul><ul><li>Even a sale is not incentive enough </li></ul>
  5. 5. Metadata <ul><li>The modern word for ‘Data about data’ </li></ul><ul><li>Generally structured data describing an e-Print in this context </li></ul><ul><li>Describing an object such as a journal article or book chapter or thesis </li></ul>
  6. 6. Metadata issues for today <ul><li>Who needs the quality? </li></ul><ul><li>What kind of quality? </li></ul><ul><li>How we approached it in TARDis </li></ul><ul><ul><li>the depositor </li></ul></ul><ul><ul><li>the process </li></ul></ul><ul><ul><li>classification </li></ul></ul><ul><ul><li>mediation </li></ul></ul><ul><li>Balancing demands the pragmatic way </li></ul>
  7. 7. Who needs the quality? <ul><li>Service providers (i.e. search services ) </li></ul><ul><li>Analysis in both e-learning and e-prints communities showed concern about quality of metadata in individual databases to give good search results when combined in cross-domain search services </li></ul><ul><li>Barton, Jane, Currier, Sarah and Hey, Jessie M.N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-Prints communities of practice . In: 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications , DCMI, 39-48. </li></ul><ul><li>http://eprints.soton.ac.uk/archive/00000020/ </li></ul>
  8. 8. As I am in Oxford… <ul><li>a tribute in Elvish to JRR Tolkien from the Lord of the Rings </li></ul>
  9. 9. Gandalf on Dublin Core metadata <ul><li>‘I cannot read the fiery letters,’ said Frodo in a quavering voice. </li></ul><ul><li>‘No’ said Gandalf ‘but I can. ……this in the Common Tongue is what is said, close enough: </li></ul><ul><li>One Ring to rule them all, One Ring to find them, </li></ul><ul><li>One Ring to bring them all and in the darkness bind them .’ </li></ul>
  10. 10. Standards for e-Prints: Dublin Core Metadata Sets <ul><li>Define minimal metadata elements for simple resource discovery </li></ul><ul><li>e.g. title, creator, subject and keywords, publisher, date, rights management </li></ul><ul><li>Fundamental building blocks for Open Archive Initiative compliant repositories </li></ul><ul><li>Software such as GNU EPrints is OAI compliant (in DSpace may need ‘switching on’) </li></ul><ul><li>Full text searching (in latest version) will give additional help to compensate for weaknesses </li></ul>
  11. 11. Who needs the quality? <ul><li>Academics (the depositors) need reasonable quality for their publication record whether full text is available or not </li></ul><ul><ul><li>Tendency to think a good citation matters less if access leads straight to the full text </li></ul></ul><ul><ul><li>An institutional repository needs </li></ul></ul><ul><li>To represent their own work well </li></ul><ul><li>To represent their faculty and university well </li></ul><ul><li>For publicity and communication </li></ul><ul><li>For research assessment and proposals </li></ul><ul><li>For promotion </li></ul>
  12. 12. What kind of quality? <ul><li>Fit for purpose – visibility and citability </li></ul><ul><li>Rolls Royce or Volkswagon Golf or a Skoda? </li></ul><ul><li>The Rolls Royce may not produce a sustainable repository </li></ul><ul><li>Library of Congress had to think again with a backlog of millions </li></ul><ul><li>A departmental archive had to scrap its editors (too slow) </li></ul><ul><li>Need a model with a light touch </li></ul>
  13. 13. Examples to correct <ul><li>From an academic’s current departmental publication record: </li></ul><ul><li>Co-author given as Fadden on older references </li></ul><ul><li>Given as McFadden on newer ones </li></ul><ul><li>McFadden would not find all his papers! </li></ul>
  14. 14. Examples to correct <ul><li>Authors are not perfect but neither are information specialists or other sources </li></ul><ul><li>Recent examples: </li></ul><ul><li>Author’s assistant put a conference in year 2400 </li></ul><ul><li>‘ Web of Knowledge’ put a conference in 2010 </li></ul><ul><li>NB Amazon proved useful for checking book information from the title page (new Amazon ‘search inside’ service) but main entries may be less accurate </li></ul>
  15. 15. Quality Assurance Procedures <ul><li>Would like to pick up these and obvious examples of metadata in the wrong field eg book title used for title of chapter </li></ul><ul><li>Options include regular checking (e.g at or close to time of deposit or for annual reporting) or random checking </li></ul><ul><li>Visualisation techniques promising but still expensive </li></ul>
  16. 16. How we approached it in TARDis <ul><li>Looked at process from point of view of depositor </li></ul><ul><ul><li>to decrease the barriers to deposit </li></ul></ul><ul><ul><li>to improve quality by design or example </li></ul></ul><ul><li>Looked at metadata required for a good citation </li></ul><ul><ul><li>academics using e-print records for many purposes not just visibility </li></ul></ul><ul><li>Some information may be easier to strip out if required but harder to add later e.g. </li></ul><ul><ul><li>first name or initials – although cultural variations too </li></ul></ul><ul><ul><li>journal title or abbreviation </li></ul></ul>
  17. 17. Simple things deter <ul><li>Questions you can’t answer </li></ul><ul><li>No place to put it </li></ul><ul><li>Errors which force you to enter it again </li></ul><ul><li>On a credit card payment </li></ul><ul><ul><li>Date on the card: 06/05 </li></ul></ul><ul><ul><li>Date to enter: 06/2005 </li></ul></ul><ul><ul><li>How many times do I do this incorrectly! </li></ul></ul>
  18. 18. To help the depositor <ul><li>Aimed to enter information as the depositor sees it on the full text </li></ul><ul><li>Arranged input in the order the information is seen </li></ul><ul><li>With relevant information grouped together </li></ul><ul><li>With ‘pages’ of daunting size </li></ul><ul><li>Fields of a size to view as much of the text as possible </li></ul>
  19. 19. TARDis - Aiding deposit – relevant fields – relevant help
  20. 20. The Process <ul><li>Added help where examples are useful </li></ul><ul><li>Added extra buttons at top to ease navigation </li></ul><ul><li>Made mandatory fields where essential </li></ul><ul><li>Tension between full details and deterrent </li></ul><ul><ul><li>commentary field currently not included although some might find useful </li></ul></ul>
  21. 21. Some ‘quality’ traditions may be less practical <ul><li>Search service recommendations: capitals only for first word of title except proper nouns </li></ul><ul><li>Process is generally ‘cut and paste’ so result is variable and advice ignored </li></ul><ul><li>Get Caps, non-caps, rarely ALL CAPS </li></ul><ul><li>Found in practice likely to be too time consuming to insist </li></ul><ul><li>Think retrieval first rather than consistency </li></ul>
  22. 22. Classification – a specific area of debate <ul><li>ePrints UK exploring automatic classification with Dewey </li></ul><ul><li>TARDis looked at current practice: </li></ul><ul><li>Reviewed subject classification in discipline based and early institutional archives </li></ul><ul><li>Found whole variety of choices and levels of complexity </li></ul>
  23. 23. TARDis on subject classification <ul><li>Discussion of issues and snapshot chart http://tardis.eprints.org </li></ul><ul><li>Using basic Library of Congress with view to harvesting eg papers in Oceanography </li></ul><ul><li>Added search box to find subject </li></ul><ul><li>Departments could use an additional scheme if they wish (software option) </li></ul><ul><li>Keywords can be added (cut and paste) if available (sometimes papers also have classification categories added for a journal) </li></ul><ul><li>Computer classification generally expensive and requires learning examples but accuracy is improving </li></ul>
  24. 24. Towards the future – subject classification – on the fly
  25. 25. Mediation <ul><li>TARDis is experimenting with deposit choices </li></ul><ul><li>Branch to: </li></ul><ul><ul><li>Self archiving (author or local assistant) with light review as pass through submission buffer </li></ul></ul><ul><ul><li>Assisted archiving – give us the file with essential details not evident from the full text </li></ul></ul>
  26. 26. Mediation in practice <ul><li>Current experience: </li></ul><ul><ul><li>Assisted archiving often time consuming – meeting the difficult ones - but can add value (e.g.fuller publisher location details such as DOI) </li></ul></ul><ul><ul><li>Self archiving less accurate but author may know details which may be missing from full text </li></ul></ul><ul><ul><li>Balance likely to change as authors become either more familiar with early deposit or perhaps happy to delegate to save time </li></ul></ul><ul><ul><li>Learning curve for us – later may devolve some quality responsibility (use editorial options) </li></ul></ul><ul><ul><li>Give additional feedback into software </li></ul></ul>
  27. 27. The challenge of cutting and pasting from PDFs <ul><li>Sometimes rather like the Hyperbookworms (Jasper Fforde, The Eyre Affair ) </li></ul><ul><li>Who produce spurious capitals, apostrophes, hyphens </li></ul><ul><li>Problems with hyphens, accents and words starting with f! </li></ul><ul><li>LaTex usually the culprit so Humanities have an advantage here </li></ul>
  28. 28. Balancing demands the pragmatic way <ul><li>Author deposit changes the equation </li></ul><ul><li>Incentives can increase accuracy </li></ul><ul><ul><li>Deposit support </li></ul></ul><ul><ul><li>Requests by department or university or funding council for up to date records </li></ul></ul><ul><li>Collaboration between author, department and information specialist may be best way forward </li></ul><ul><li>Aim: light quality control to achieve visibility and citability </li></ul>
  29. 29. The New World of e-Prints <ul><li>Not so elegant to work in as an Oxford College Library such as Brasenose </li></ul><ul><li>But should be just as satisfying to use as it meets new needs </li></ul>
  30. 30. Thank you <ul><li>For further information: </li></ul><ul><li>TARDis http://tardis.eprints.org/ </li></ul><ul><li>e-Prints Soton (Research Soton) http://eprints.soton.ac.uk/ </li></ul><ul><li>FAIR Focus on Access to Institutional Resources Programme </li></ul><ul><li>&quot; Improving the Quality of Metadata in Eprint Archives &quot; Marieke Guy and Andy Powell Ariadne Issue 38 30-January-2004 </li></ul><ul><li>Barton, Jane, Currier, Sarah and Hey, Jessie M.N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-Prints communities of practice . In: 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications , DCMI, 39-48. </li></ul><ul><li>http://eprints.soton.ac.uk/archive/00000020/ </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.