Your SlideShare is downloading. ×
Tags vs Shelves: From Social Tagging to Social Classification
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Tags vs Shelves: From Social Tagging to Social Classification

719
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
719
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Tags vs Shelves:From Social Tagging to Social Classification Hypertext 2011Arkaitz Zubiaga, Christian K¨rner, Markus Strohmaier o UNED (Madrid, Spain) & Graz University of Technology (Graz, Austria) June 8th, 2011
  • 2. MotivationIndex1 Motivation2 User Behavior Measures3 Experiments4 Results5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 2 / 31
  • 3. MotivationBook Cataloging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 3 / 31
  • 4. MotivationBook Cataloging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 4 / 31
  • 5. MotivationBook Cataloging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 5 / 31
  • 6. MotivationBook Cataloging Librarians have been cataloging books for centuries. The task of manually cataloging books becomes very expensive and effortful for large collections. For instance, the Library of Congress reported an average cost of $94.58 for cataloging each book in 2002 (291,749 books, total: $27.5 million) Given the enormous costs and efforts required for the task, research is moving towards automatic classification. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 6 / 31
  • 7. MotivationAutomatic Classification of Books Problem: it is not easy to get data representing the aboutness of the books. In addition, content of books is not always available digitally. Solution: Social tags provided by users have shown to be helpful (Zubiaga et al, 2009)1 . Social tagging sites like LibraryThing and GoodReads are gathering vast amounts of tag annotations on books. 1 A. Zubiaga, R. Mart´ ınez, V. Fresno. Getting the Most Out of Social Annotations for Web Page Classification. DocEng2009. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 7 / 31
  • 8. MotivationTagging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 8 / 31
  • 9. MotivationSocial Tagging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 9 / 31
  • 10. MotivationSocial Tagging Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 10 / 31
  • 11. MotivationProblem StatementCan we find a type of user whose tags further resemble the categorizationby experts?Can we characterize those users? Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 11 / 31
  • 12. MotivationUser Behavior K¨rner et al.2 suggested and described the existence of two kinds of o user behavior: Categorizers and Describers. Categorizer Describer Goal of Tagging later browsing later retrieval Change of Tag Vocabulary costly cheap Size of Tag Vocabulary limited open Tags subjective objective Previous works suggest that Describers rather help infer semantic relations among tags. Our goal is to discover whether this kind of tagging behavior affects the usefulness of tags as to the social classification of books. 2 C. K¨rner. Understanding the Motivation behind Tagging. Hypertext 2009. o Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 12 / 31
  • 13. MotivationUser Behavior Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 13 / 31
  • 14. MotivationUser Behavior Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 14 / 31
  • 15. User Behavior MeasuresIndex1 Motivation2 User Behavior Measures3 Experiments4 Results5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 15 / 31
  • 16. User Behavior MeasuresUser Behavior Measures Tags per Post (TPP) – Verbosity r |Tur | TPP(u) = (1) |Ru | Orphan Ratio (ORPHAN) – Diversity |R(tmax )| n= (2) 100 o |Tu | o , T = {t||R(t)| ≤ n} ORPHAN(u) = (3) |Tu | u Tag Resource Ratio (TRR) – Verbosity + Diversity |Tu | TRR(u) = (4) |Ru |C. K¨rner, R. Kern, H.-P. Grahsl, and M. Strohmaier. Of categorizers and Describers: an evaluation of quantitative measures for otagging motivation. Hypertext 2010. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 16 / 31
  • 17. User Behavior MeasuresComputing measures These 3 measures provide a weight for each user. These weights enable to infer a ranking of users according to each measure. From these rankings, we choose subsets of users as extreme Categorizers (highest-ranked) and extreme Describers (lowest-ranked). Subsets range from 10% to 100%, with a step size of 10%. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 17 / 31
  • 18. User Behavior MeasuresBook Cataloging We select subsets of users according to number of tag assignments. Selecting by percents of users would be unfair, since it would provide different amounts of data. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 18 / 31
  • 19. User Behavior MeasuresObjective We aim at analyzing whether: Categorizers provide tags that further help infer categorization performed by experts. Describers provide tags that further resemble book descriptions. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 19 / 31
  • 20. ExperimentsIndex1 Motivation2 User Behavior Measures3 Experiments4 Results5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 20 / 31
  • 21. ExperimentsDatasets Set of 38,149 popular books, with categorization data made by experts: 27,299 categorized according to DDC (10 categories). 24,861 categorized according to LCC (20 categories). Tagging data from 153k+ users on LibraryThing and 110k+ users on GoodReads (100+ users annotated each book). Additional descriptive data: Book synopses (Barnes&Noble). User reviews (LibraryThing, GoodReads, and Amazon.com). Editorial reviews (Amazon.com). Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 21 / 31
  • 22. ExperimentsTag-based Book Classification Software: Multiclass Support Vector Machines (svm-multiclass3 ). Vectorial representation of books, using tag frequency values. We perform 6 different training set selections of 18,000 books, and show the average accuracy. #correctguesses Accuracy: #testset . 3 http://svmlight.joachims.org/svm multiclass.html Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 22 / 31
  • 23. ExperimentsDescriptiveness of Tags Vectorial representation of books (Tr ), using tag frequency values. Vectorial representation of books (Rr ), using term frequency values on descriptive data (synopses, reviews). Cosine similarity between Tr and Rr : Tr · Rr similarityr = cos(θr ) = = Tr Rr n Tri × Rri n n (5) 2 × 2 i=1 i=1 (Tri ) i=1 (Rri ) Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 23 / 31
  • 24. ResultsIndex1 Motivation2 User Behavior Measures3 Experiments4 Results5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 24 / 31
  • 25. ResultsResults GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 1 TPP measure: Categorizers outperform Describers for classification. 2 All the measures (though especially TRR): Describers further resemble descriptive data. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 25 / 31
  • 26. ResultsResults GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 3 Verbosity helps find extreme Categorizers. Users who think of a specific shelf to place the book tend to assign a tag identifying the shelf. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 26 / 31
  • 27. ResultsResults GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 4 Diversity does not work to find Categorizers on GoodReads. GoodReads suggests previously used tags to the user, so that it affects diversity of tags. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 27 / 31
  • 28. ResultsResults GoodReads LibraryThing TPP (verb.) TRR (div.) ORP. (verb. + div.) TPP (verb.) TRR (div.) ORP. (verb. + div.) Classification Descriptiveness 5 Users providing non-descriptive tags (i.e., different from Describers) produce more accurate classification. Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 28 / 31
  • 29. Conclusions & OutlookIndex1 Motivation2 User Behavior Measures3 Experiments4 Results5 Conclusions & Outlook Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 29 / 31
  • 30. Conclusions & OutlookConclusions & Outlook Social classification of books with tagging data, discriminating extreme Categorizers and Describers. It complements previous research by showing that users so-called Categorizers produce more accurate classification. Non-verbose, non-descriptive, shelf-driven tagging produces more accurate classification of books. Outlook: Further analyzing tagging behavior to find: generalists (users who provide general tags), and specialists (users who provide more specific tags rather focused on the subject). Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 30 / 31
  • 31. Conclusions & OutlookThank YouAchiu Arigato Danke Dhannvaad Dua Netjer en ek Efcharisto Gracias Gr`cies a Gratia Grazie Guishepeli Hvala Kiitos K¨sz¨n¨m Merc´ Merci Mila o o o e esker Obrigado Shukran Tack Tak Takk Shukriya T¨nan Tapadh leat Tesekk¨r ederim Thank a u you Toda E-mail: azubiaga@lsi.uned.es @arkaitz Zubiaga, K¨rner, Strohmaier () o Tags vs Shelves June 8th, 2011 31 / 31