Finding Hierarchy in Facets
The Great Chain of Being
Linnaeus chose a different facet
 
Why do we need facets in Search? <ul><li>Search result sets are bigger </li></ul><ul><li>More metadata associated with eac...
Two search paradigms  Choose your facets beforehand…
…or not
The simple keyword search box has become the tool of choice
Possible Facets <ul><li>Format </li></ul><ul><li>Subject </li></ul><ul><li>Language  </li></ul><ul><li>Author </li></ul><u...
The FAST Model <ul><li>Several facets are peeled away from LCSH… </li></ul><ul><li>Form (Genre) </li></ul><ul><li>Chronolo...
Browsable Hierarchy on a Human Scale - HILCC
Flat Tag Sets
Building Structure in the UI to Make Tags More Focused
Structured Patron Tags
Clustering Tags 101 <ul><li>Inputs: {User, Tag, Bib} </li></ul><ul><li>Start with a similarity measure between tags. </li>...
Vector Cosine Similarity <ul><li>Model each tag as a vector V of weighted features. </li></ul><ul><li>Features are bib ids...
An Example of a Cluster <ul><li>(leonardo da vinci, bible stories, intelligent design, christianity, darwinism, opus dei, ...
What Clusters Together? <ul><li>Unifications -- different user vocabularies (a.k.a. synonyms, misspellings, abbreviations)...
Structuring Clusters (Intrinsic Methods) <ul><li>Lexical subsumption -- book -> picture book -> children’s picture book. <...
Using [Heymann and Garcia-Molina, 2006] <ul><li>christianity </li></ul><ul><li>family tree of jesus christ </li></ul><ul><...
Structuring Clusters (Extrinsic Methods) <ul><li>WordNet ([Stoica, Hearst, Richardson, 2007]) </li></ul><ul><ul><li>Synset...
Widening the Similarity Net <ul><li>User / community modeling </li></ul><ul><ul><li>Tag profiles for users </li></ul></ul>...
Full General-Purpose Automation? <ul><li>Techniques are exquisitely sensitive to features that are computationally accessi...
Bibliography <ul><li>http://del.icio.us/ronbraun/code4libhierarchy </li></ul>
Upcoming SlideShare
Loading in …5
×

Building Mountains Out of Molehills

1,633 views

Published on

Faceted navigation, which is an increasingly common feature of library OPACS, was initially developed to browse hierarchical data. MARC data however, has relatively little hierarchy, and user-generated tags have even less. The flatness of this data, makes the navigation of search result-sets cumbersome and often ineffective. BiblioCommons has been tracking academic research and industry best-practices in this realm, and experimenting with different methods of adding structure to these datasets. This session will share learning to date.

Published in: Spiritual, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,633
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Building Mountains Out of Molehills

    1. 1. Finding Hierarchy in Facets
    2. 2. The Great Chain of Being
    3. 3. Linnaeus chose a different facet
    4. 5. Why do we need facets in Search? <ul><li>Search result sets are bigger </li></ul><ul><li>More metadata associated with each result </li></ul><ul><li>Our brains can’t efficiently manage large lists of data </li></ul>
    5. 6. Two search paradigms Choose your facets beforehand…
    6. 7. …or not
    7. 8. The simple keyword search box has become the tool of choice
    8. 9. Possible Facets <ul><li>Format </li></ul><ul><li>Subject </li></ul><ul><li>Language </li></ul><ul><li>Author </li></ul><ul><li>Place </li></ul><ul><li>Era </li></ul><ul><li>Publication Date </li></ul><ul><li>Genre </li></ul><ul><li>Collection </li></ul>
    9. 10. The FAST Model <ul><li>Several facets are peeled away from LCSH… </li></ul><ul><li>Form (Genre) </li></ul><ul><li>Chronological </li></ul><ul><li>Geographical tag </li></ul><ul><li>Personal Names </li></ul><ul><li>Corporate Names </li></ul><ul><li>… but a Hard Nut Remains: </li></ul><ul><li>Topical Subject Headings </li></ul>
    10. 11. Browsable Hierarchy on a Human Scale - HILCC
    11. 12. Flat Tag Sets
    12. 13. Building Structure in the UI to Make Tags More Focused
    13. 14. Structured Patron Tags
    14. 15. Clustering Tags 101 <ul><li>Inputs: {User, Tag, Bib} </li></ul><ul><li>Start with a similarity measure between tags. </li></ul><ul><li>First tag forms initial cluster. </li></ul><ul><li>For remaining tags, if similarity between tag and cluster exceeds threshold, add tag to cluster, else create new cluster. </li></ul><ul><li>Complications: similarity measures, cluster normalization, multiple cluster membership, etc. </li></ul>
    15. 16. Vector Cosine Similarity <ul><li>Model each tag as a vector V of weighted features. </li></ul><ul><li>Features are bib ids. </li></ul><ul><li>Weights are the number of times all users assigned the tag to the feature. </li></ul><ul><li>cos(V1, V2) = V1 • V2 / (|V1|*|V2|), yields [0, 1] where 0 is no similarity and 1 is maximal similarity. </li></ul><ul><li>Trigonometric interpretation: cosine of angular distance between vectors. </li></ul>  V{1, 3} V{3, 1}
    16. 17. An Example of a Cluster <ul><li>(leonardo da vinci, bible stories, intelligent design, christianity, darwinism, opus dei, atheism, family tree of jesus christ, christian ethics, esoteric religion, morality tales, knights templar) </li></ul>
    17. 18. What Clusters Together? <ul><li>Unifications -- different user vocabularies (a.k.a. synonyms, misspellings, abbreviations). </li></ul><ul><li>Abstraction -- different levels of generality (a.k.a. vertical relationships, IS-A, subsumption, hypernym). </li></ul><ul><ul><li>Abstraction navigation. </li></ul></ul><ul><ul><li>Hierarchical roll-up for faceting. </li></ul></ul><ul><li>Semantic relationships -- various associations that link terms semantically (a.k.a. horizontal relationships, HAS-A, semantic co-occurrences). </li></ul><ul><ul><li>‘ See also’ navigation. </li></ul></ul><ul><li>And yes, spurious associations (a.k.a. noise, crap). </li></ul>
    18. 19. Structuring Clusters (Intrinsic Methods) <ul><li>Lexical subsumption -- book -> picture book -> children’s picture book. </li></ul><ul><li>Operational subsumption -- T1 subsumes T2 if set of bibs tagged by T1 is superset of those of T2 (~80%). </li></ul><ul><li>Use association rules to characterize association strength (with support and confidence metrics) between tags and infer relationships. </li></ul><ul><li>Social network theory to analyze similarity graph. </li></ul><ul><ul><li>Compute closeness centrality for tags in similarity graph. </li></ul></ul><ul><ul><li>Order tags by maximal centrality. </li></ul></ul><ul><ul><li>Add to taxonomy tree at most similar node or at root if similarity threshold is not met. </li></ul></ul>
    19. 20. Using [Heymann and Garcia-Molina, 2006] <ul><li>christianity </li></ul><ul><li>family tree of jesus christ </li></ul><ul><li>opus dei </li></ul><ul><li>leonardo da vinci </li></ul><ul><li>esoteric religion </li></ul><ul><li>knights templar </li></ul><ul><li>atheism </li></ul><ul><li>intelligent design </li></ul><ul><li>darwinism </li></ul><ul><li>christian ethics </li></ul><ul><li>bible stories </li></ul><ul><li>morality tales </li></ul>
    20. 21. Structuring Clusters (Extrinsic Methods) <ul><li>WordNet ([Stoica, Hearst, Richardson, 2007]) </li></ul><ul><ul><li>Synsets to recognize synonyms and polysemy </li></ul></ul><ul><ul><li>IS-A links (hypernyms) to recognize abstraction; can also provide labels for hierarchical facets. </li></ul></ul><ul><li>LC Classifications / Subject Headings </li></ul><ul><li>Specialized ontologies </li></ul><ul><ul><li>Gazetteers for geospatial tags (e.g., GNS, GNIS, Alexandria Digital Library, Getty thesaurus of geonames). </li></ul></ul><ul><ul><li>Affect taxonomies (Sentiment AI). </li></ul></ul><ul><li>Introduces classification task to map into ontologies. </li></ul><ul><li>Danger! Ontology structure may introduce noisy structure, causing more problems than benefits. </li></ul>
    21. 22. Widening the Similarity Net <ul><li>User / community modeling </li></ul><ul><ul><li>Tag profiles for users </li></ul></ul><ul><ul><li>Tag taxonomies for specific user communities. </li></ul></ul><ul><li>Bib modeling </li></ul><ul><ul><li>Similar titles based on tag features </li></ul></ul><ul><ul><li>Best of lists for user communities. </li></ul></ul><ul><li>Folding in other metadata during clustering </li></ul><ul><ul><li>Pseudotag generation -- automated tag creation from metadata (e.g., LCSH), ontologies, or free text analysis (mining significant terms). </li></ul></ul>
    22. 23. Full General-Purpose Automation? <ul><li>Techniques are exquisitely sensitive to features that are computationally accessible. </li></ul><ul><ul><li>People use background knowledge and context. </li></ul></ul><ul><li>Absolutely useful for solving particular tasks. </li></ul><ul><li>Human curation probably a necessary component. </li></ul><ul><ul><li>Bootstrap structure through automated techniques. </li></ul></ul><ul><ul><li>Incentivize curation. </li></ul></ul><ul><ul><li>Manage human time via active learning techniques. </li></ul></ul>
    23. 24. Bibliography <ul><li>http://del.icio.us/ronbraun/code4libhierarchy </li></ul>

    ×