Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enhancing social tagging with a knowledge organization system


Published on

Presentation slides associated with the paper "Enhancing Social Tagging with a Knowledge Organization System" written by Koraljka Golub, Jim Moon, Douglas Tudhope and Marianne Lykke Nielsen, accepted for the IFLA Satellite Meeting, Emerging Trends in Technology: Libraries Between Web 2.0, Semantic Web and Search, Florence, 19-20 August 2009. Much of the content of the slides is taken from previous presentations given by Koraljka Golub of UKOLN and Brian Matthews of STFC

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Enhancing social tagging with a knowledge organization system

  1. 1. UKOLN is supported by: Enhancing Social Tagging with a Knowledge Organization System Koraljka Golub, Jim Moon, Douglas Tudhope and Marianne Lykke Nielsen Emerging Trends in Technology: Libraries Between Web 2.0, Semantic Web and Search, IFLA Satellite Meeting, Florence, 19-20 August 2009
  2. 2. Enhancing Social Tagging with a Knowledge Organization System Presentation given by: Michael Day Research & Development Team Leader UKOLN, University of Bath Bath BA1 4BD, UK [email_address]
  3. 3. Presentation outline <ul><li>EnTag project context </li></ul><ul><li>Intute case study </li></ul><ul><ul><li>Methods used </li></ul></ul><ul><ul><li>The interface </li></ul></ul><ul><ul><li>Some observations </li></ul></ul><ul><li>STFC EPubs repository case study </li></ul><ul><li>Conclusions and further work </li></ul>
  4. 4. EnTag project <ul><li>Enhanced tagging for discovery project </li></ul><ul><ul><li> </li></ul></ul><ul><li>Funded by the Joint Information Systems Committee (JISC) </li></ul><ul><li>Partners: </li></ul><ul><ul><li>Funded: UKOLN (University of Bath), University of Glamorgan, Science and Technology Facilities Council (STFC), Intute </li></ul></ul><ul><ul><li>Non-funded: OCLC Office of Research, Danish Royal School of Library and Information Science </li></ul></ul>
  5. 5. Controlled vocabularies (1) <ul><li>Traditional way of providing subject classification </li></ul><ul><ul><li>For location (shelf-marks, subject browsing) </li></ul></ul><ul><ul><li>For searching </li></ul></ul><ul><ul><li>For association of resources </li></ul></ul><ul><li>Different types used, such as </li></ul><ul><ul><li>Subject classification schemes </li></ul></ul><ul><ul><li>Controlled keyword lists </li></ul></ul><ul><ul><li>Thesauri </li></ul></ul>
  6. 6. Controlled vocabularies (2) <ul><li>General observations: </li></ul><ul><ul><li>Enables the precise classification of resources </li></ul></ul><ul><ul><ul><li>Good for precision and recall </li></ul></ul></ul><ul><ul><li>Hierarchical schemes can exploit the structure to modify search queries </li></ul></ul><ul><ul><ul><li>Broader/narrower/related terms </li></ul></ul></ul><ul><ul><li>Expensive </li></ul></ul><ul><ul><ul><li>Requires investment in specialist expertise to devise the vocabulary </li></ul></ul></ul><ul><ul><ul><li>Requires investment in specialist expertise to classify resources. </li></ul></ul></ul><ul><ul><li>Difficult to maintain their currency </li></ul></ul>
  7. 7. Social tagging (1) <ul><li>The Web 2.0 way of providing search terms </li></ul><ul><li>People “tag” resources with free-text terms of their own choosing </li></ul><ul><li>Tags used to associate resources together </li></ul><ul><li>Examples: </li></ul><ul><ul><li>, Flickr, Connotea, LibraryThing </li></ul></ul><ul><li>“ Folksonomy” </li></ul><ul><ul><li>The terms that a community chooses to tag its resources </li></ul></ul>
  8. 8. Social tagging (2) <ul><li>People often use the same tags or keywords </li></ul><ul><ul><li>Supports retrieval - makes things that mean the same thing to people easier to find </li></ul></ul><ul><li>A potentially cheap way of getting a very large number of resources classified </li></ul><ul><ul><li>Represents the “community consensus” in some sense </li></ul></ul><ul><ul><li>“ The Wisdom Of Crowds” </li></ul></ul><ul><ul><li>Has currency as people continue update </li></ul></ul><ul><ul><li>Tag clouds of popular tags (many examples) </li></ul></ul>
  9. 9. Social tagging (3) <ul><li>However, in uncontrolled contexts: </li></ul><ul><ul><li>Individuals often use similar but not identical tags: </li></ul></ul><ul><ul><ul><li>e.g. Semantic Web, SemanticWeb, SemWeb, SWeb </li></ul></ul></ul><ul><ul><li>Individuals make mistakes in tags </li></ul></ul><ul><ul><ul><li>Spelling errors, using spaces or punctuation incorrectly </li></ul></ul></ul><ul><ul><li>Mixture of subject terms, genre terms, etc. </li></ul></ul><ul><ul><li>Some tags are more specific than others – difficult to get consistency </li></ul></ul><ul><ul><li>Tags often have personal meaning, but no (immediate) wider significance, e.g. “favourite” </li></ul></ul>
  10. 10. EnTag objectives <ul><li>Project aims: </li></ul><ul><ul><li>To investigate the combination of controlled and social tagging approaches to support resource discovery in repositories and digital collections </li></ul></ul><ul><ul><li>To investigate whether the use of an established controlled vocabulary can help move social tagging beyond personal bookmarking to aid resource discovery </li></ul></ul><ul><ul><li>To Improve tagging </li></ul></ul><ul><ul><ul><li>Relevance of tags, Consistency, Efficiency </li></ul></ul></ul><ul><ul><li>To Improve retrieval </li></ul></ul><ul><ul><ul><li>Effectiveness (degree of match between user and system terminologies) </li></ul></ul></ul>
  11. 11. EnTag main approach <ul><li>Main focus: </li></ul><ul><ul><li>To compare free tagging with no instructions with tagging using a combined system with guidance for users </li></ul></ul><ul><li>Two demonstrators: </li></ul><ul><ul><li>Intute digital collection </li></ul></ul><ul><ul><ul><li>The main focus of EnTag </li></ul></ul></ul><ul><ul><ul><li>Tagging by reader </li></ul></ul></ul><ul><ul><ul><li>Using a cohort of students to evaluate tools </li></ul></ul></ul><ul><ul><li>STFC repository </li></ul></ul><ul><ul><ul><li>Tagging by author </li></ul></ul></ul><ul><ul><ul><li>A more qualitative approach </li></ul></ul></ul>
  12. 12. The Intute study
  13. 13. Intute metadata
  14. 14. Intute study: demonstrator <ul><li>11,042 stripped records </li></ul><ul><li>Politics </li></ul><ul><li>Interfaces </li></ul><ul><ul><li>Searching </li></ul></ul><ul><ul><li>Simple: free tagging </li></ul></ul><ul><ul><li>Enhanced: DDC / LCSH / Relative Index </li></ul></ul>
  15. 15. Intute demonstrator: searching Searching
  16. 16. Intute demonstrator: Enhanced Tagging interfaces
  17. 17. Enhanced interface
  18. 18. Intute study: user study 1 <ul><li>Research questions </li></ul><ul><ul><li>Choice of tag </li></ul></ul><ul><ul><li>Retrieval implications </li></ul></ul><ul><li>Participants </li></ul><ul><ul><li>28 UK politics students </li></ul></ul><ul><ul><li>Little tagging experience </li></ul></ul>
  19. 19. Intute study: user study 2 <ul><li>Data collection </li></ul><ul><ul><li>Logging </li></ul></ul><ul><ul><li>Three questionnaires </li></ul></ul><ul><li>Four tagging tasks </li></ul><ul><ul><li>Two controlled, two free </li></ul></ul><ul><ul><li>Tag 15 documents in each task </li></ul></ul>
  20. 20. Intute study: user study 3 <ul><li>Hypothetical group project scenario </li></ul><ul><li>Instructions </li></ul><ul><ul><li>5 to 10 min per document </li></ul></ul><ul><ul><li>Open document but focus </li></ul></ul><ul><ul><li>Try consider enhanced suggestions where appropriate </li></ul></ul>Imagine that as part of one of your courses, you are asked to write a four-page essay on the topic of European integration, as a joint project in groups of four. The essay should critically discuss existing theories about the creation of the European Union and its institutions. Your lecturer has instructed you to look for resources in the EnTag system. Since you will be working together with three other students, you should tag the documents you retrieve with tags that would be useful to you but would also enable other students to find those documents in EnTag and understand from your tags what the documents are about.
  21. 21. Intute results: number of tags <ul><li>7,568 tags in total </li></ul><ul><li>278 tags per person </li></ul><ul><li>94 + 751 documents tagged (controlled + free task) </li></ul><ul><li>More in simple interface </li></ul><ul><li>More in free task </li></ul>
  22. 22. Intute results: tag selection <ul><li>Simple interface </li></ul><ul><ul><li>91% freely assigned </li></ul></ul><ul><li>Enhanced interface </li></ul><ul><ul><li>71% freely assigned </li></ul></ul><ul><ul><li>17% controlled tags </li></ul></ul><ul><li>Other features (both interfaces) </li></ul><ul><ul><li>8% other taggers’ tags </li></ul></ul><ul><ul><li>2% main tag cloud </li></ul></ul><ul><ul><li>< 1% own tag </li></ul></ul>
  23. 23. Intute results: browsing for tags <ul><li>Simple interface </li></ul><ul><ul><li>73% others’ tags </li></ul></ul><ul><ul><li>17% main tag cloud </li></ul></ul><ul><ul><li>10% own tag </li></ul></ul><ul><li>Enhanced interface </li></ul><ul><ul><li>74% controlled vocabulary </li></ul></ul><ul><ul><li>18% others’ tags </li></ul></ul>
  24. 24. Intute results: retrieval implications <ul><li>Versus metadata records </li></ul><ul><ul><li>All tags: new access points for 36% documents </li></ul></ul><ul><ul><li>Controlled tags: new access points for 69% documents </li></ul></ul><ul><li>Search terms </li></ul><ul><ul><li>More in tags than in (un)controlled keywords (2x / 3x) </li></ul></ul>
  25. 25. Intute results: post-questionnaires 1 <ul><li>Post-task </li></ul><ul><ul><li>Familiar / easy / satisfied / certain </li></ul></ul><ul><ul><li>Useful: own tags, DDC disambiguation pane, DDC suggestions </li></ul></ul><ul><ul><li>Not useful: main tag cloud, others’ names, hierarchical DDC pane </li></ul></ul>
  26. 26. Intute results: post-questionnaires 2 <ul><li>Post-study </li></ul><ul><ul><li>Easy to learn and useful in real life </li></ul></ul><ul><ul><li>Simple </li></ul></ul><ul><ul><ul><li>+ Simplicity, speed, freedom of choice </li></ul></ul></ul><ul><ul><ul><li> No suggestions </li></ul></ul></ul><ul><ul><li>Enhanced </li></ul></ul><ul><ul><ul><li>+ Suggestions </li></ul></ul></ul><ul><ul><ul><li> Inappropriate suggestions, cluttered interface, number of steps </li></ul></ul></ul>
  27. 27. Intute study: conclusions <ul><li>Controlled vocabulary suggestions are valued if appropriate </li></ul><ul><li>Potential for additional access points </li></ul><ul><li>Value of added consistency for information retrieval </li></ul>
  28. 28. The STFC ePubs study <ul><li>Institutional Repository </li></ul><ul><li>A study of the Authors of papers </li></ul><ul><ul><li>Smaller number - c.10-12. </li></ul></ul><ul><ul><li>Regular depositors ( > 10 papers each) </li></ul></ul><ul><ul><li>Subject experts </li></ul></ul><ul><li>Expectation that they would want their papers accurately tagged to support precision in recall </li></ul><ul><li>A more qualitative study </li></ul><ul><li>Used the ACM Computing Classification Scheme </li></ul><ul><ul><li>Widely used in the community </li></ul></ul>
  29. 29. STFC ePubs: study approach <ul><li>Questions </li></ul><ul><ul><li>Do authors appreciate the purpose and use of tags? </li></ul></ul><ul><ul><li>Value of using a controlled vocabulary – does it lead to the creation of better tags? </li></ul></ul><ul><ul><li>Evaluating the user interface </li></ul></ul><ul><li>Supervised sessions </li></ul><ul><ul><li>40 minute observed trial </li></ul></ul><ul><ul><li>Logging statistics </li></ul></ul><ul><ul><li>Task worksheet </li></ul></ul><ul><ul><ul><li>Tagging own papers – a number of their choice </li></ul></ul></ul><ul><ul><ul><li>Tag cloud, own tags, controlled vocabulary </li></ul></ul></ul>
  30. 30. STFC ePubs: study limitations <ul><li>A number of limitations of this approach: </li></ul><ul><ul><li>Small sample size </li></ul></ul><ul><ul><li>Small number of papers tagged </li></ul></ul><ul><ul><li>Inappropriate controlled vocabulary </li></ul></ul><ul><ul><li>Computing and IT specialists too familiar with the concept of semantic annotation. </li></ul></ul><ul><ul><li>Single, observed use of the tool – not real life </li></ul></ul><ul><li>Nevertheless, it was felt that the results of the study were illuminating and useful </li></ul>
  31. 31. STFC ePubs repository <ul><ul><li> </li></ul></ul>
  32. 32. ePubs – a single repository entry
  33. 33. The Tagger
  34. 34. Browsing the Thesaurus
  35. 35. Browsing the Thesaurus
  36. 36. Picking terms
  37. 37. Global vs. Personal Tag Cloud
  38. 38. Picking terms from the Tag Cloud
  39. 39. STFC study findings: term choice <ul><li>Chose terms from the bottom of the hierarchy if possible. </li></ul><ul><li>Often preferred an appropriate term from the thesaurus over their own </li></ul><ul><ul><li>Appreciated the better IR properties </li></ul></ul><ul><li>Would like definitions of terms to be available </li></ul><ul><li>Would like automatic suggestions </li></ul><ul><li>Very little use of the Tag Cloud </li></ul><ul><ul><li>Presentation of cloud? Unfamiliarity? Limited population? </li></ul></ul>
  40. 40. STFC study findings: user interface <ul><li>Tool generally (though not universally) thought to be easy to use </li></ul><ul><ul><li>Some wanted it to be simpler </li></ul></ul><ul><ul><li>More suited for a library professional? </li></ul></ul><ul><li>Wanted more automation </li></ul><ul><li>Tag cloud interface not right </li></ul><ul><li>Would be willing to use </li></ul><ul><ul><li>Especially if benefit in improved retrieval could be established. </li></ul></ul>
  41. 41. STFC study findings: preferred style <ul><li>Most depositors had a strong preference for the way they interact with the system. </li></ul><ul><ul><li>Free text taggers: </li></ul></ul><ul><ul><ul><li>Enter tags, don’t really use the vocabulary </li></ul></ul></ul><ul><ul><li>Thesaurus browsers: </li></ul></ul><ul><ul><ul><li>systematically browse controlled vocabulary, </li></ul></ul></ul><ul><ul><li>Thesaurus searchers: </li></ul></ul><ul><ul><ul><li>Use the vocabulary search tool for preference </li></ul></ul></ul><ul><ul><ul><li>only enter free-text term when they can’t find an appropriate term </li></ul></ul></ul>
  42. 42. STFC study findings: ACM scheme <ul><li>ACM Computing Classification Scheme </li></ul><ul><ul><li>General recognition of this scheme </li></ul></ul><ul><ul><li>Used in journals to classify papers </li></ul></ul><ul><ul><ul><li>Meant that there was acceptance of its authority </li></ul></ul></ul><ul><ul><ul><li>Willingness to use it </li></ul></ul></ul><ul><ul><li>Feeling that it was abstract and academic </li></ul></ul><ul><ul><li>Feeling that it was not up to date and had much missing </li></ul></ul>
  43. 43. Comparison of Intute and STFC EPubs results <ul><li>Different user groups and approach to studies </li></ul><ul><li>Similarities between the Intute and STFC users could be identified: </li></ul><ul><ul><li>Users appreciated the benefits of consistency and vocabulary control </li></ul></ul><ul><ul><ul><li>Willingness to engage with the tagging system </li></ul></ul></ul><ul><ul><li>Support for automated suggestions </li></ul></ul><ul><ul><li>Appropriateness of the controlled vocabulary is important </li></ul></ul><ul><ul><li>Tag cloud hard to use effectively </li></ul></ul><ul><ul><li>The user interface and interaction is important </li></ul></ul>
  44. 44. Observations (1) <ul><li>Users are willing to add tags using a controlled vocabulary in conjunction with free text </li></ul><ul><ul><li>By and large they understand why it is useful </li></ul></ul><ul><ul><ul><li>Good search terms = good retrieval </li></ul></ul></ul><ul><ul><li>But they need help </li></ul></ul><ul><ul><ul><li>Automation, suggestions, good interfaces </li></ul></ul></ul><ul><ul><ul><li>Support for different styles of interaction </li></ul></ul></ul><ul><ul><li>Produce “better” tags (?) </li></ul></ul><ul><li>Need for flexible and targeted controlled vocabularies </li></ul>
  45. 45. Observations (2) <ul><li>“ Web 2.0” features need to be thought through very carefully </li></ul><ul><ul><li>“ tag clouds” not a success </li></ul></ul><ul><ul><li>Need much better structuring and presentation </li></ul></ul><ul><ul><li>integrated </li></ul></ul><ul><li>Interaction between tag clouds and structured vocabularies needs further investigation </li></ul><ul><ul><li>Develop flexible user focussed vocabularies from tags </li></ul></ul><ul><ul><li>“ structured folksonomy” </li></ul></ul>
  46. 46. Conclusions <ul><li>Controlled vocabulary and tags complement each other </li></ul><ul><li>Controlled vocabulary suggestions are valued if appropriate </li></ul><ul><li>Future work: </li></ul><ul><ul><li>Qualitative analysis </li></ul></ul><ul><ul><li>Enhancements </li></ul></ul><ul><ul><ul><li>Controlled vocabulary </li></ul></ul></ul><ul><ul><ul><li>Auto suggestions </li></ul></ul></ul><ul><ul><ul><li>Interface </li></ul></ul></ul><ul><ul><li>Motivation for tagging </li></ul></ul><ul><li>Would users actually enhance tags in “real life” ? </li></ul>
  47. 47. Acknowledgements <ul><li>I would like the thank the EnTag project team for providing the main content of this presentation </li></ul><ul><ul><li>Koraljka Golub (UKOLN), Joint Conference on Digital Libraries (JCDL), Austin, TX, 15-19 June 2009 </li></ul></ul><ul><ul><li>Brian Matthews (STFC), presentation given at ISKO UK Conference, London, 22-23 June 2009 </li></ul></ul><ul><ul><li>Douglas Tudhope (University of Glamorgan) </li></ul></ul> /
  48. 48. Thank You!