Pragmatic Evaluation of Concept Hierarchies

2,264 views

Published on

Best Paper presentation of our iknow2012 talk

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,264
On SlideShare
0
From Embeds
0
Number of Embeds
1,154
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Pragmatic Evaluation of Concept Hierarchies

  1. 1. Graz University of Technology Pragmatic Evaluation of Concept Hierarchies Christoph Trattner, Philipp Singer Denis Helic, Markus Strohmaier Graz University of Technology, AustriaT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 1
  2. 2. Graz University of Technology Part 1 What is this talk about  We will introduce a framework to evaluate concept hierarchies that do not rely on a Golden-Standard  Framework determines the pragmatic usefulness of concept hierarchies utilizing Kleinberg‟s idea of hierarchical decentralized search Part 2  We will show evidence that the framework does not only work in theory but also in practiceT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 2
  3. 3. Graz University of Technology What was the motivation of our research?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 3
  4. 4. Graz University of Technology Directories: Categorization by ExpertsT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 4
  5. 5. Graz University of Technology Research question Can a crowd of users contribute to the creation of such categorizations? How can we generate such hierarchical structures automatically?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 5
  6. 6. Graz University of Technology Annotation by Users: Tagging  Folksonomy  Tuple (U, R, T, Y)  User (U)  Resource (R)  Tag (T)  Relation (Y)T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 6
  7. 7. Graz University of Technology Folksonomies  Emerge from the process of collaborative tagging  Latent hierarchical structures  Turn flat structure into hierarchy  taxonomy induction algorithms  Generality-based algorithms (centrality in tag-to-tag networks)  Other algorithms possible: k-means, affinity propagation, ...  E.g., [Heyman and Garcia-Molina 2006] or [Benz et al. 2010]T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 7
  8. 8. Graz University of Technology Problem: How can we evaluate the usefulness of these hierarchies?  Idea: Golden standard based methods  Problem: Lack of golden standard [Strohmaier et al. 2012] little taxonomic overlap => results are not trustworthy M. Strohmaier, D. Helic, D. Benz, C. Körner and R. Very small overlap !!! Kern, Evaluation of Folksonomy Induction Algorithms, In the ACM Transactions on Intelligent Systems and TechnologyT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 8
  9. 9. Graz University of Technology Question? Can we somehow find another evaluation method?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 9
  10. 10. Graz University of Technology Stanley Milgram  A social psychologist  Yale and Harvard University  Study on the Small World Problem, beyond well defined communities and relations 1933-1984 (such as actors, scientists, …)  „An Experimental Study of the Small World Problem”T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 10
  11. 11. Graz University of Technology The simplest way of formulating the small-world problem is: Starting with any two people in the world, what is the likelihood that they will know each other? A somewhat more sophisticated formulation, however, takes account of the fact that while person X and Z may not know each other directly, they may share a mutual acquaintance - that is, a person who knows both of them. One can then think of an acquaintance chain with X knowing Y and Y knowing Z. Moreover, one can imagine circumstances in which X is linked to Z not by a single link, but by a series of links, X-A-B-C-D…Y- Z. That is to say, person X knows person A who in turn knows person B, who knows C… who knows Y, who knows Z. [Milgram 1967, according to ]http://www.ils.unc.edu/dpr/port/socialnetworking/theory_paper.html#2]T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 11
  12. 12. Graz University of Technology An Experimental Study of the Small World Problem [Travers and Milgram 1969]  A Social Network Experiment tailored towards  Demonstrating  Defining  And measuring  Inter-connectedness in a large society (USA)  A test of the modern idea of “six degrees of separation”  Which states that: every person on earth is connected to any other person through a chain of acquaintances not longer than 6T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 12
  13. 13. Graz University of Technology Set Up Target Boston  Target person: stockbroker  A Boston stockbroker  Three starting populations Nebraska Boston  100 “Nebraska stockholders”random random  96 “Nebraska random” Nebraska  100 “Boston random” stockholdersT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 13
  14. 14. Graz University of Technology Results  How many of the starters would be able to establish contact with the target?  64 out of 296 reached the target  How many intermediaries would be required to link starters with the target?  Well, that depends: the overall mean 5.2 links  Through hometown: 6.1 links  Through business: 4.6 links  Boston group faster than Nebraska groups  Nebraska stockholders not faster than Nebraska random  What form would the distribution of chain lengths take?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 14
  15. 15. Graz University of Technology Decentralized Search  Search in (social) networks  people have only local knowledge of the network  People have background knowledge of the network, e.g. geography  Background knowledge defines the notion of distance between nodes  People are greedy: at each step people select a node that has the smallest distance to the target  Kleinberg explained the process of navigating a network and finding others with only local knowledge   Decentralized search with hierarchical background knowledge [Kleinberg 2000]T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 15
  16. 16. Graz University of Technology Hierarchical decentralized searcher Information Network HierarchyT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 16
  17. 17. Graz University of Technology Idea! Use Kleinberg„s model of decentralized search in social networks and apply it to information networks.T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 17
  18. 18. Graz University of Technology Framework  Hence, we implemented a framework that takes as input a given hierarchy & network and determines the usefulness of this hierarchy for navigating the network [Helic et al. 2011]. Hierarchy Useful? Yes/No Framework Hierarchical Decentralized D. Helic, M. Strohmaier, C. Trattner, M. Muhr, K. Searcher Lerman, Pragmatic Evaluation of Folksonomies, 20th Network International World Wide Web Conference (WWW2011), Hyderabad, India, March 28 - April 1, ACM, 2011.T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 18
  19. 19. Graz University of Technology Question? To what extent are current tag hierarchy induction algorithms useful for navigation?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 19
  20. 20. Graz University of Technology Evaluating Tag Hierarchy Induction Algorithms  In [Helic et al. 2011 we used this kind of framework to evaluate 5 different hierarchy induction algorithms on 5 different datasets (25 combinations)  BibSonomy  Delicious  CiteUlike  Flickr  LastFM  Simulations were based on a random sample of 100.000 search pairs  Measuring the success rate and stretch for evaluationT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 20
  21. 21. Graz University of Technology Evaluating Tag Hierarchy Induction Algorithms BibSonomy CiteULike Delicious Results: Centrality-based hierarchy induction algorithms outperform complicated methods such as K-Means or Affinity Flickr Propagation LastFMT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 21
  22. 22. Graz University of Technology Question What are the differences and similarities of hierarchies based on different types of annotations? To what extent are hierarchies based on tags more useful for navigation than hierarchies based on keywords?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 22
  23. 23. Graz University of Technology Tags  We KeywordsT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 23
  24. 24. Graz University of Technology Results Results: Tag-based Hierarchies are more useful for navigation than keyword- based hierarchiesT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 24
  25. 25. Graz University of Technology Question??? To what extent is it justified to model human navigation in information networks with hierarchical decentralized search?T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 25
  26. 26. Graz University of Technology Idea? Compare Simulations with real world data! Exploring the Differences and Similarities between Hierarchical Decentralized Search and Human Navigation in Information NetworksT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 26
  27. 27. Graz University of Technology Evaluation  We compared simulations with human click trails of the online Game – The Wiki Game (http://thewikigame.com/)  Contains 1,500,000 click trails of more than 500,000 users with (start; target) information.T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 27
  28. 28. Graz University of Technology Hierachy Creation Two types of hierarchies were evaluated 1.) First type is based on our previous work  Categorial Concepts: Wikipedia Category Label Dataset:  Tags from Delicious 2,300,000 category labels,  Category labels from Wikipedia 4,500,000 articles, 30,000,000 category label assignments Delicious Tag Dataset: 440,000 tags, 580,000 articles and 3,400,000 tag assignments Similarity Graph Latent Hierarchical TaxonomyT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 28
  29. 29. Graz University of Technology Hierarchy Creation 2.) Second type is based on the work of [Muchnik et al. 2007] Simple idea: Algorithm iterates through all links in the network and decides if that link is of a hierarchical type, in which case it remains in the network otherwise it is removed.Directed link-network dataset of theEnglish-Wikipedia from February2012.All in all, the dataset includesaround 10,000,000 articles andaround 250,000,000 links Muchnik, L., Itzhack, R., Solomon S. and Louzoun Y.: Self-emergence of knowledge trees: Extraction of the Wikipedia hierarchies, PHYSICAL REVIEW E 76, 016106 (2007)T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 29
  30. 30. Graz University of Technology Evaluation Metrics  Success Rate: Percentage of target nodes found  Number of Hops: Number of hops needed to reach the target node  Stretch: Fraction of number of the number of steps and global shortest path  Path Similarity: intersection(h_clicks,s_clicks)/s_clicks  Degree: median in- and out-degree values of the nodes visited by the simulator and the human navigator  Transition SimilarityT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 30
  31. 31. Graz University of Technology What are the results??T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 31
  32. 32. Graz University of Technology Results: Hops, Stretch, Success Rate Success Rate: 100% Success Rate: 31.6% Stretch: 2.5 Stretch: 1.7 Humans Searcher with Wikipedia Category HierarchyT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 32
  33. 33. Graz University of Technology Results: Hops, Stretch, Success Rate Success Rate: 100% Success Rate: 69% Stretch: 2.5 Stretch: 8.8 Humans Searcher with Wikipedia Delicious HierarchyT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 33
  34. 34. Graz University of Technology Results: Hops, Stretch, Success Rate Success Rate: 100% Success Rate: 93% Stretch: 2.5 Stretch: 1.5 Humans Searcher with Wikipedia Network HierarchyT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 34
  35. 35. Graz University of Technology Results: Path Similarity Question: How similar are the paths taken by our searcher compared to the humans Humans vs. Humans Humans vs. SimulatorsT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 35
  36. 36. Graz University of Technology Results: Degree In- Degree Out- DegreeT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 36
  37. 37. Graz University of Technology Results: Transition Similarity Humans SearcherT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 37
  38. 38. Graz University of Technology Conclusions  We have shown that our approach of hierarchical decentralized search models human navigation in information networks fairly well  Furthermore, we have shown that hierarchies created directly from the link network are better suited for navigation than hierarchies that are created from external knowledgeT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 38
  39. 39. Graz University of Technology What we plan for the Future?  Enhance the framework to consider not only navigation but also search (= search box)  Evaluation of alternative navigational structures  and many more things T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 39
  40. 40. Graz University of Technology Take home message Network hierarchies are better suited for Thank you! navigation than hierarchies created from external knowledge Christoph Trattner Philipp Singer Denis Helic Markus Strohmaier ctrattner@iicm.edu philipp.singer@tugraz.at dhelic@tugraz.at markus.strohmaier@tugraz.at www.christophtrattner.info www.philippsinger.info http://coronet.iicm.edu/ www.markusstrohmaier.info denis/homepage/ @ctrattner @ph_singer @dhelic @mstrohmT Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012 40

×