FaSet: A Set Theory Model for Faceted Search

2,274 views

Published on

Presentation of the paper "FaSet: A Set Theory Model for Faceted Search" by D. Bonino, F. Corno, L. Farinetti at the 2009 IEEE/ACM/WIC Intenationational Conference on Web Intelligence

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,274
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
47
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

FaSet: A Set Theory Model for Faceted Search

  1. 1. Politecnico di Torino Dipartimento di Automatica e Informatica http://elite.polito.it FaSet: A Set Theory Model for Faceted Search Dario Bonino, Fulvio Corno, Laura Farinetti
  2. 2. Outline  Faceted Search  Goal  The FaSet Set-Theoretical Model  FaSet Relational Implementation 2 WI/IAT 2009, Milano, Italy FaSet
  3. 3. Faceted Classification  Originated in Library Science  Ranganathan, 1962  Content-based classification scheme  Multi-dimensional  Facet = classification dimension  Multi-valued  Focus = allowed value in one of the facets 3 WI/IAT 2009, Milano, Italy FaSet
  4. 4. Example Color Shape Taste Facets Yellow Cube Sweet Red Sphere Bitter Orange Cone Neutral Allowed foci for Green Cylinder Acid each facet Blue White Black Choice of the foci describing the item 4 WI/IAT 2009, Milano, Italy FaSet
  5. 5. Faceted Search Systems  Faceted Classification  Simple, intuitive, versatile, powerful  Adopted by more and more web sites  As a classification system for their products/items/documents/resources/…  As a model for the user interface in search, filtering, refinement 5 WI/IAT 2009, Milano, Italy FaSet
  6. 6. Examples 6 WI/IAT 2009, Milano, Italy FaSet
  7. 7. Examples 7 WI/IAT 2009, Milano, Italy FaSet
  8. 8. Examples 8 WI/IAT 2009, Milano, Italy FaSet
  9. 9. Facets in the real world  Multi-valued Color Shape classification Yellow Squared ▼ Red Cube  During classification Orange Parallelepiped  During search Green Rounded ▼  AND vs OR semantics? Sphere Blue  Hierarchical (nested) White Cylinder facets Black  Parents selectable? Other Weight  Incomplete classification 0-50 g  Numerical ranges 50-100 g 100+ g 9 WI/IAT 2009, Milano, Italy FaSet
  10. 10. Facets in the Literature User Interfaces Data and logic model  Active research field  Methodologies from since ~2000 Library science  Usability studies (Broughton, Vickery)  Mainly for search  Formal models interfaces  Dynamic Taxonomies  Application case studies (Sacco)  Web vs desktop  Uniformities, Lattices (Priss) environment  Granular computing  Mainly for multimedia data  Less applicable results 10 WI/IAT 2009, Milano, Italy FaSet
  11. 11. Goal of the paper  Propose a formal model: FaSet  for representing  Faceted Classification of resources  Faceted Search Interfaces for such resource sets  Searching, Filtering, Ranking operations  compatible with modern web applications  Mathematically simple  Easy mapping to Relational Algebra  Decouple classification and resources  versatile and flexible  Supports all “real-world” variations on Facets 11 WI/IAT 2009, Milano, Italy FaSet
  12. 12. Facets and Foci  Facets: disjoint sets U  Fa, Fb, Fc, … Fb  Facet space:  U = Fa  Fb  Fc  … Fa  Focus L: subset  La  Fa Fa  Many foci for each facet La<2>  Focus name: index list La<1>  La<i,j,k,…> La<1,1> La<1,2> 12 WI/IAT 2009, Milano, Italy FaSet
  13. 13. Hierarchy  Hierarchical nesting of Fa La<2> foci is represented by La<1> subset containment La<1,1>  La<narrower>  La<1,2> La<broader>  Locus names are  Incomplete taxonomy chosen to represent  No overlap allowed hierarchical containment  A focus may be larger  La<i,j,k>  La<i,j> than the union of its sub-  Reminds of Dewey Decimal foci Classification 13 WI/IAT 2009, Milano, Italy FaSet
  14. 14. Classification (Facet)  Resources r are Fa La<2> classified w.r.t. the facet La<1> space La<1,1>  “Projection”: r  Fa La<1,2>  We may only represent projections built by r  Fa combining foci  r  Fa = ∪p La<p>  Just the focus names are needed  {<1,1>,<2>} 14 WI/IAT 2009, Milano, Italy FaSet
  15. 15. Classification (Multidimensional)  On the multi- dimensional space, the rU cartesian product is r  Fb taken  r  U = rFa  rFb  ...  Just the focus names r  Fa are needed  15 WI/IAT 2009, Milano, Italy FaSet
  16. 16. Searching in FaSet  Resources r r1  Classified as r  U Fb q r2  Query q  Expressed uniformly as q  U Fa  Search = Filtering + Ranking  Filtering: r is relevant to q iff: (r  U) ⋂ (q  U)    Ranking: estimate the similarity S(q, r) of r to q 16 WI/IAT 2009, Milano, Italy FaSet
  17. 17. Filtering  All resources that match, even partially, with the query  (r  U) ⋂ (q  U)    May be easily computed by checking focus names  Prefix-compatibility: La<p1> ≍ La<p2> iff  p1 = p2, or  p1 is a prefix of p2, or  p2 is a prefix of p1  At least one couple of foci, per each facet, must be prefix-compatible  ∀Fa : ∃ La<p1> ∈ q, La<p2> ∈ r : La<p1> ≍ La<p2> 17 WI/IAT 2009, Milano, Italy FaSet
  18. 18. Example L<> L<1> L<2> L<1,1> L<1,2> L<1,3> L<2,1> L<2,2> <1,3> <2> q <1> r1 <2,2> r2 <1,2> <1,3> r3 <1,1> <1,2> r4 18 WI/IAT 2009, Milano, Italy FaSet
  19. 19. Ranking  Compute similarity between resource and query  Often neglected by Faceted Search Interfaces  Define a Similarity Measure S(q, r) ∈ [0,1]  Compute similarity between matching foci (deeper matches give higher scores)  Aggregate focus-based similarity measures in the same facet (fuzzy sum)  Normalize facet-level results  Aggregate facet-based similarity measures across all facets (fuzzy product) 19 WI/IAT 2009, Milano, Italy FaSet
  20. 20. FaSet Relational Implementation  The FaSet classification requires  A constant set of Facets  A constant set of Foci  An “index” table storing the list of focus names for each resource constant Resource Database 20 WI/IAT 2009, Milano, Italy FaSet
  21. 21. FaSet Relational Implementation  The FaSet search algorithm uses  Set operations  Universal and existential quantification  Aggregate operations for computing ranking measures  Directly supported by Relational DBMS primitives 21 WI/IAT 2009, Milano, Italy FaSet
  22. 22. Future work  Experimentation of FaSet on sample data sets  Performance evaluation  Integration with front-end AJAX interfaces  CMS module  MIT Exhibit  Evaluation of the ranking algorithm from the Information Retrieval point of view 22 WI/IAT 2009, Milano, Italy FaSet
  23. 23. Conclusions - FaSet  Formally defined faceted Representation & Search model  Light formalism  Supports hierarchies, nesting, multiple classification, incomplete specifications, …  Compatible with modern web development technologies Thank you! 23 WI/IAT 2009, Milano, Italy FaSet

×