Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tagging vs. Controlled Vocabulary!
Which is More Helpful for Book Search?
Toine Bogers1 & Vivien Petras2

1 Aalborg Univer...
2
Outline
•  Introduction 
•  Methodology
•  Results
•  Follow-up analysis 
•  Conclusions & Future Work
Tagging vs. Controlled Vocabulary Indexing
Controlled Vocabulary (CV)
+ Semantic relationships 
- Large development costs ...
Study Objectives
What do tags and controlled vocabularies really bring to the table in a
realistic search environment? 
1....
Methodology
! How do we evaluate retrieval performance? 
•  Large collection of documents
•  Realistic information needs (...
Methodology
•  Book collection 
-  Controlled vocabularies in library catalogs
-  Tags in social cataloging sites
23
INEX Test Collection of Book Records
User-Generated Content (UGC)
Bibliographic Metadata (Core)
Author
Title
Publicatio...
Methodology
•  Book collection 
-  Controlled vocabularies in library catalogs
-  Tags in social cataloging sites
•  Book ...
27
Annotated LT topic
Group name
Topic title
Narrative
Methodology
•  Book collection 
-  Controlled vocabularies in library catalogs
-  Tags in social cataloging sites
•  Book ...
31
Annotated LT topic
Group name
Topic title
Narrative
Recommended
books
37
Experimental setup
•  INEX Test collection for book records
-  Any-CV = 2 mio. records
-  Each-CV = 350,000 records
•  ...
39
Comparing controlled vocabulary sources
•  Question
-  Which of the three sources of controlled vocabulary provides the...
40
Comparing metadata elements
•  Questions
-  Which of the metadata elements provides the best stand-alone performance?
-...
41
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
0.08!
0.09!
0.10!
0.11!
0.12!
0.13!
C
ore!
C
ontrolled
vocabulary!
R
ev...
45
Comparing metadata elements
•  Answers
-  Reviews are the best performing metadata elements by far
‣  Significantly so c...
46
Follow-up analysis (1)
•  Question
-  What is the nature of the difference between tags and CV: do they
complement each...
48
-1.0!
-0.9!
-0.8!
-0.7!
-0.6!
-0.5!
-0.4!
-0.3!
-0.2!
-0.1!
0.0!
0.1!
0.2!
0.3!
0.4!
0.5!
0.6!
0.7!
0.8!
0.9!
1.0!
1! 2...
49
Follow-up analysis (2)
•  Question
-  Does the book type influence the relative performance of tags vs. CV?
‣  Fiction
‣...
51
Fiction vs. non-fiction
0.00!
0.01!
0.02!
0.03!
0.04!
0.05!
0.06!
0.07!
Query-CV-Fiction!
Query-CV-Non-fiction!
Query-Tag...
52
Follow-up analysis (3)
•  Question
-  Does the type of information need influence the relative performance of tags
vs. C...
53
Follow-up analysis (3)
•  Answers
-  Tags are better at satisfying
known-item needs and mixes
of search & recommendatio...
55
Conclusions
•  Tags have a slight (but not significant) advantage over CV 
•  Tags and CV provide largely complementary ...
Questions? Comments? Suggestions?
Upcoming SlideShare
Loading in …5
×

Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?

482 views

Published on

The popularity of social tagging has sparked a great deal of debate on whether tags could replace or improve upon professional metadata as descriptors of books and other information objects. In this paper we present a large-scale empirical comparison of the contributions of individual information elements like core bibliographic data, controlled vocabulary terms, reviews, and tags to the retrieval performance. Our comparison is done using a test collection of over 2 million book records with information elements from Amazon, the British Library, the Library of Congress, and LibraryThing. We find that tags and controlled vocabulary terms do not actually outperform each other consistently, but seem to provide complementary contributions: some information needs are best addressed using controlled vocabulary terms whereas other are best addressed using tags.

(Paper presentation @ iConference 2015, Newport Beach)

Published in: Science
  • Be the first to comment

  • Be the first to like this

Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?

  1. 1. Tagging vs. Controlled Vocabulary! Which is More Helpful for Book Search? Toine Bogers1 & Vivien Petras2 1 Aalborg University Copenhagen, Denmark 2 Humboldt-Universität zu Berlin, Germany iConference 2015, Newport Beach March 25, 2015
  2. 2. 2 Outline •  Introduction •  Methodology •  Results •  Follow-up analysis •  Conclusions & Future Work
  3. 3. Tagging vs. Controlled Vocabulary Indexing Controlled Vocabulary (CV) + Semantic relationships - Large development costs Tagging + Use of the users’ vocabulary - No term normalization 5 Previous studies: ! Analyze nature of terms mostly: overlap / complementary vocabulary ! Few and conflicting results for retrieval ! Small samples
  4. 4. Study Objectives What do tags and controlled vocabularies really bring to the table in a realistic search environment? 1.  Which (combination of) metadata elements can best contribute to retrieval success? 2.  How does the retrieval performance of tags and CVs compare using a large-scale and realistic test collection under carefully controlled circumstances? 8
  5. 5. Methodology ! How do we evaluate retrieval performance? •  Large collection of documents •  Realistic information needs (= topics) •  Relevance judgments (= relevant documents for topics) 12
  6. 6. Methodology •  Book collection -  Controlled vocabularies in library catalogs -  Tags in social cataloging sites
  7. 7. 23 INEX Test Collection of Book Records User-Generated Content (UGC) Bibliographic Metadata (Core) Author Title Publication year Publisher Reviews Tags Controlled Vocabulary Content (CV) DDC class labels Amazon subjects Amazon geographic names Amazon category labels DDC class labels LCSH topical terms Geographic names Personal names Chronological terms Genre/form terms DDC class labels LCSH topical terms Geographic names Personal names Chronological terms Genre/form terms
  8. 8. Methodology •  Book collection -  Controlled vocabularies in library catalogs -  Tags in social cataloging sites •  Book search information needs -  LibraryThing fora
  9. 9. 27 Annotated LT topic Group name Topic title Narrative
  10. 10. Methodology •  Book collection -  Controlled vocabularies in library catalogs -  Tags in social cataloging sites •  Book search information needs -  LibraryThing fora •  Book search relevance judgements -  LibraryThing fora
  11. 11. 31 Annotated LT topic Group name Topic title Narrative Recommended books
  12. 12. 37 Experimental setup •  INEX Test collection for book records -  Any-CV = 2 mio. records -  Each-CV = 350,000 records •  LibraryThing forum topics -  Query and Narrative representations -  640 different topics split in half for training the IR system and testing •  Relevance judgements: recommendations from LT members -  with graded relevance scoring (highest relevance if book is added by searcher) •  Evaluation metric: Normalized Discounted Cumulated Gain (NDCG@10) -  Evaluated for the first 10 results of search output -  Scores range between 0.0 and 1.0
  13. 13. 39 Comparing controlled vocabulary sources •  Question -  Which of the three sources of controlled vocabulary provides the best performance? •  Answer -  No significant differences in performance for the different providers or their combination -  Amazon is not better or worse than British Library or Library of Congress -  Subsequent experiments combine all CV sources
  14. 14. 40 Comparing metadata elements •  Questions -  Which of the metadata elements provides the best stand-alone performance? -  Which metadata element performs better: tags or CV? -  Which combination of metadata elements provides the best performance?
  15. 15. 41 0.00! 0.01! 0.02! 0.03! 0.04! 0.05! 0.06! 0.07! 0.08! 0.09! 0.10! 0.11! 0.12! 0.13! C ore! C ontrolled vocabulary! R eview s! Tags! U ser-generated content! C ore + C ontrolled vocabu C ore + R eview s! C ore + Tags! C ore + U ser-generated co Allfields! Query! Narrative! Core Controlled vocabulary Reviews Tags Core + Controlled vocabulary Core + Reviews Core + Tags Core + User- generated content All fields User- generated content Results of (combinations of) element sets per topic representation
  16. 16. 45 Comparing metadata elements •  Answers -  Reviews are the best performing metadata elements by far ‣  Significantly so compared to all other individual metadata elements -  Combining metadata elements nearly always outperforms individual elements ‣  All metadata elements combined provide the best overall performance -  Slight advantage of tags over CV (but not significantly)
  17. 17. 46 Follow-up analysis (1) •  Question -  What is the nature of the difference between tags and CV: do they complement each other or overlap?
  18. 18. 48 -1.0! -0.9! -0.8! -0.7! -0.6! -0.5! -0.4! -0.3! -0.2! -0.1! 0.0! 0.1! 0.2! 0.3! 0.4! 0.5! 0.6! 0.7! 0.8! 0.9! 1.0! 1! 21! 41! 61! 81! 101! 121! 141! 161! 181! 201! 221! 241! 261! 281! 301! 321! -1.0! -0.9! -0.8! -0.7! -0.6! -0.5! -0.4! -0.3! -0.2! -0.1! 0.0! 0.1! 0.2! 0.3! 0.4! 0.5! 0.6! 0.7! 0.8! 0.9! 1.0! 1! 26! 51! 76! 101! 12 Per-topic differences (Tags vs. controlled voca 0.04! 0.05! 0.06! 0.07! Query! Narrative! ΔNDCG@10 Tags > CV CV > tags Per-topic differences (Tags vs. controlled vocabularies) •  Answer -  Tags and CVs outperform each other on different topic sets, offering complementary performance
  19. 19. 49 Follow-up analysis (2) •  Question -  Does the book type influence the relative performance of tags vs. CV? ‣  Fiction ‣  Non-fiction
  20. 20. 51 Fiction vs. non-fiction 0.00! 0.01! 0.02! 0.03! 0.04! 0.05! 0.06! 0.07! Query-CV-Fiction! Query-CV-Non-fiction! Query-Tags-Fiction! Query-Tags-Non-fiction! Query! Narrative! Fiction Non-fiction Controlled vocabulary Fiction Non-fiction Tags•  Answer -  Advantage of tags over CV terms is most pronounced for fiction book requests (but never significantly so) -  Retrieving relevant non-fiction books is easier than retrieving relevant fiction books
  21. 21. 52 Follow-up analysis (3) •  Question -  Does the type of information need influence the relative performance of tags vs. CV? ‣  Search ‣  Recommendation ‣  Search + Recommendation ‣  Known-item
  22. 22. 53 Follow-up analysis (3) •  Answers -  Tags are better at satisfying known-item needs and mixes of search & recommendation aspects -  CV is better for pure recommendation needs -  Differences are indicative, but not significant 0.00! 0.01! 0.02! 0.03! 0.04! 0.05! 0.06! 0.07! 0.08! 0.09! 0.10! S! S+R! Controlled vocabulary! Tags! 0.00! 0.01! 0.02! 0.03! 0.04! 0.05! 0.06! 0.07! 0.08! 0.09! 0.10! S! S+R! Controlled vocabulary! Tags! Query Narrative Search Search + Recommendation Recommendation Known-item Search Search + Recommendation Recommendation Known
  23. 23. 55 Conclusions •  Tags have a slight (but not significant) advantage over CV •  Tags and CV provide largely complementary performance •  Future work -  Detailed analysis of precision/recall effect of tags vs. CV ‣  CV contains more unique terms, tags more repetition of terms ‣  Possible consequence: CVs boost recall, tags boost precision -  Detailed analysis of which types of tags/CV match relevant documents -  More detailed analysis of request types and their relation to tag/CV performance
  24. 24. Questions? Comments? Suggestions?

×