RawSugar Faceted Search


Published on

Talk given at the SIGIR 2006, workshop on faceted search

Published in: Technology, News & Politics
1 Comment
  • Best one
    Hope you are in good health. My name is AMANDA . I am a single girl, Am looking for reliable and honest person. please have a little time for me. Please reach me back amanda_n14144@yahoo.com so that i can explain all about myself .
    Best regards AMANDA.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Human knowledge is essential for better search Today it is expansive and does not scale
  • RawSugar Faceted Search

    1. 1. Faceted Search for Tagged Content Better Navigation for Your Web
    2. 2. What’s Missing from Tag Search and Overall Search? <ul><li>Text search is missing in exploration and navigation </li></ul><ul><ul><li>Asthma, Digital Camera , biking, … </li></ul></ul><ul><li>Human built taxonomies and classifications with faceted search do that but they do not scale </li></ul><ul><ul><li>E.g., Amazon , shopping.com , Yahoo Shopping , Froogle </li></ul></ul><ul><ul><li>Very expensive to scale outside most valuable horizontals </li></ul></ul><ul><li>Tagging might provide the answer but not as done today </li></ul><ul><ul><li>Exploration and navigation in tagged accounts is poor </li></ul></ul><ul><ul><ul><li>DailyKos – blog tag cloud is pretty but useless </li></ul></ul></ul><ul><ul><ul><li>juniorbonner – del.icio.us account </li></ul></ul></ul><ul><ul><li>Overall navigation and search in tagged content is poor </li></ul></ul><ul><ul><ul><li>Restaurants in del.icio.us </li></ul></ul></ul><ul><ul><ul><li>Technorati tag search </li></ul></ul></ul>
    3. 3. RawSugar Solution Overview <ul><li>RawSugar Key Technology: </li></ul><ul><ul><li>Faceted search with hierarchical tags </li></ul></ul><ul><ul><li>Providing the best facets dynamically </li></ul></ul><ul><ul><li>Automatic and seamless merging of multiple users, experts and automatic hierarchies </li></ul></ul><ul><li>User Benefits : </li></ul><ul><ul><li>Improved navigation, guided search for tagged content </li></ul></ul><ul><ul><ul><li>In individual user accounts and blogs </li></ul></ul></ul><ul><ul><ul><li>Overall </li></ul></ul></ul><ul><ul><li>Embedded in user blogs and websites (dynamic, using AJAX) </li></ul></ul><ul><ul><li>Improved contextual ads for blog search </li></ul></ul>
    4. 4. RawSugar: Tag Search for Tagged Documents <ul><ul><li>Simple Tagging by User </li></ul></ul><ul><ul><ul><li>Flat regular tagging for del.icio.us , Flickr , YouTube , Technorati , WordPress … </li></ul></ul></ul><ul><ul><ul><li>Enhance search with optional tag hierarchy – Light structure </li></ul></ul></ul><ul><ul><ul><li>Add tag search component to blog/website </li></ul></ul></ul><ul><ul><li>RawSugar Algorithms: </li></ul></ul><ul><ul><ul><li>Crawls tags and content via syndication feeds </li></ul></ul></ul><ul><ul><ul><li>Applies algorithmic tag hierarchies </li></ul></ul></ul><ul><ul><ul><li>Provides tag-based faceted search </li></ul></ul></ul>
    5. 5. RawSugar Technology In Action <ul><li>Single Collections </li></ul><ul><ul><li>Del.icio.us: Juniorbonner delicious or RawSugar </li></ul></ul><ul><ul><li>Blogs: Philipp Keller , engadget , techcrunch , 3pots , Sprol , etc. </li></ul></ul><ul><li>RawSugar Multiple Sources: </li></ul><ul><ul><li>Web2.0 (search of multiple accounts) </li></ul></ul><ul><li>Searches: </li></ul><ul><ul><li>Bush , Health , etc. </li></ul></ul><ul><li>Watch list: </li></ul><ul><ul><li>Gadgets blogs – gadgetsguy </li></ul></ul><ul><ul><li>Sports blogs - sportsguy </li></ul></ul><ul><ul><li>Politics blogs - modernpolitics </li></ul></ul>
    6. 6. Key technology points <ul><li>Hierarchical tagging: </li></ul><ul><ul><li>Users can define local hierarchies. </li></ul></ul><ul><li>Faceted search for tagged content </li></ul><ul><ul><li>Providing good facets for tags is hard </li></ul></ul><ul><ul><li>Algorithmic discovery of facets </li></ul></ul><ul><li>Merging Expert, User and Automatic classifications </li></ul><ul><ul><li>Multiple level hierarchies </li></ul></ul>
    7. 7. Hierarchical Tagging <ul><li>Expert Users can specify soft tag-relationship </li></ul><ul><ul><li>This creates a forest not a tree </li></ul></ul><ul><ul><li>Ambiguity is encouraged </li></ul></ul><ul><ul><li>Many local hierarchies nothing Global </li></ul></ul><ul><li>Necessary for Faceted Search: </li></ul><ul><ul><li>Tags act as both facets and values </li></ul></ul><ul><ul><li>A subtag acts as a value </li></ul></ul>User defined forest Search time tag Groupings.
    8. 8. Faceted Search for Tagged Content <ul><li>Providing good facets for tags is hard </li></ul><ul><li>Unlike database categories tag-space is messy: </li></ul><ul><ul><li>Noisy </li></ul></ul><ul><ul><ul><li>search, search engine, google, seach_engines, searchengine, searchengines, search_engine, engine, web, internet, tools, reference, searchengines, news, information, portal, engines, searching, test, tech, buscadores, tool, etc. </li></ul></ul></ul><ul><ul><li>Large : one new tag per 10 entries. Compare to at most thousand categories supported by other faceted search systems </li></ul></ul><ul><ul><li>Dynamic: and dependent on search context and user </li></ul></ul>
    9. 9. Faceted Search for Tagged Content Refine your search Food groups Locations groups Origins groups
    10. 10. Automatic Discovery of Facets <ul><li>Providing the best Tag Hierarchies </li></ul><ul><li>Core concepts: </li></ul><ul><ul><li>Only some users (4%) define tag hierarchies; for example, food>sushi, european>spanish, and so on </li></ul></ul><ul><ul><li>Analysis of tag co-occurrence patterns </li></ul></ul><ul><ul><li>Analysis of search patterns </li></ul></ul><ul><li>We mine this tag space to learn simple tag relations ( IS-A relations and RELATED ) using statistics. </li></ul><ul><li>At search time we apply this learned knowledge to group tags from results </li></ul>
    11. 11. Discovered Tag Hierarchies <ul><li>Health : acne, aging, alternative, beautiful, body, brain, breast, breastcancer, cancer, care, cause, depression, diabetes, diet, disease, exercise, fats, fitness, healthcare, heart, herbs, insurance, life, loss, medical, meditation, mental, mind, nutrition, obesity, planning, pregnancy, prevention, protective, quotes, risk, running, sleep, smoke, stress, supplements, terms, treatments, vitamins, weight, wellness, women, workout, yoga </li></ul><ul><li>Fitness : body, nutrition, running, walking, workout, yoga </li></ul><ul><li>Fun : adults, birthday, characters, contests, crazy, dance, jokes, optical, pics, plays, puzzles, quizes, sexy, silly,sudoku, toy, trivia, comedy, commercial, crazy, cute, humour, jokes, parody, pics, satire, silly, simpson, snl, spoof, strange, stupid, t-shirts, wtf </li></ul>
    12. 12. More Complex Discovered Tag Hierarchies <ul><li>internet > search engines > google >: </li></ul><ul><ul><li>ads, adsense, adwords, affiliate, analysis, analytics, api, base, censorship, chat, checkout, cookies, craigslist, e-mail, earth, geo, geotagging, gmail, gmaps, goog, google adsense, google calendar, google earth, google maps, google news, google video, google: adwords, googlebase, googleearth, googlemaps, googletalk, googlevideo, im, instant messaging, map, modules, msn, pagerank, payment, paypal, ping, players, ranking, reader, referrals, revenue, searchengines, selling, seo, sitemap, sms, stats, sync, talk, traffic, web services </li></ul></ul><ul><li>internet > search engines > yahoo > </li></ul><ul><ul><li>answers, buzz, home & living, information management, msn, personal finance, sitemap </li></ul></ul><ul><li>internet > web technologies > ajax > </li></ul><ul><ul><li>2.0, aggregation, atlas, calendars, chat, debug, dom, domain, examples, flex, forms, framework, groupware, homepage, im, jsp, libraries, messenger, patterns, portals, prototype, ria, rubyonrails, slideshow, toolkits, web services, webapps, webdev, webmail, whois, wysiwyg, xmlhttprequest, xpath </li></ul></ul><ul><li>internet > web technologies > css > </li></ul><ul><ul><li>accessibility, borders, boxing, bugs, cheatsheet, columns, dev, dom, dreamweaver, examples, floats, forms, gallery, ie, inspire, layout, markup, menu, navigation, optimizers, positioning, showcase, slideshow, standard, tables, tabs, template, tricks, typography, usability, w3c, web dev, webdesign, webdev, webmaster, webstandards, </li></ul></ul>
    13. 13. Merging Expert, User and Automatic Hierarchies <ul><li>Hierarchies can come from multiple sources: </li></ul><ul><li>Experts or External reliable sources ( e.g., Dmoz, Wordnet, etc.) </li></ul><ul><li>Users of various level of expertise </li></ul><ul><li>Automatically discovered hierarchies </li></ul><ul><li>Since hierarchies are local, imperfect we need to deal with: </li></ul><ul><li>Conflicts </li></ul><ul><li>Missing levels </li></ul><ul><li>Ambiguities </li></ul>
    14. 14. Merging Expert, User and Automatic Hierarchies europe UK Scotland Edinburgh Spain Italy food vegetarian Sushi food cooking recipes Asian Chinese Thai Southwest California Bay Area San Francisco Texas Expert 1 User 2 User 3 Automatic 22 Automatic 5
    15. 15. Some other Technology points <ul><li>Multi word tags : social search, socialsearch, social_search, etc. We automatically learn these and treat them as equal. Also includes morphology. </li></ul><ul><li>Synonyms : Based on clustering techniques we discover similar tags. For example: “ programming, development, software.” Uses </li></ul><ul><ul><li>“ See also” </li></ul></ul><ul><ul><li>Query expansion to increase recall </li></ul></ul><ul><li>Relevance scoring: based on popularity, freshness, standard relevance </li></ul><ul><li>Tag Auto-Completion : While searching or tagging - </li></ul>
    16. 16. Future Plans <ul><li>Disambiguation of hierarchies: Java, Salsa, Free, Italian </li></ul><ul><li>Use of large expert taxonomies from other sources: </li></ul><ul><ul><ul><li>Locations </li></ul></ul></ul><ul><ul><ul><li>Wikipedia, Dmoz, WordNet, Google Co-op, or other directories </li></ul></ul></ul><ul><ul><ul><li>More topic domains: Health </li></ul></ul></ul><ul><li>Sharing of tag hierarchies across users </li></ul><ul><li>Improve usability of tag hierarchy </li></ul><ul><li>Mashups </li></ul><ul><ul><li>Maps </li></ul></ul><ul><ul><li>Reviews </li></ul></ul><ul><ul><li>User content and classification </li></ul></ul>
    17. 17. Discussion <ul><li>Questions? </li></ul>
    18. 18. Backup slides
    19. 19. Why Faceted Search? Improved Exploration - morphology Locations Restaurant Type Not a restaurant!
    20. 20. Why Faceted Search? Improved Exploration - Not usable !
    21. 21. References Rashmi Sinha: “ Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html Emanuele Quintarelli: “ Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91 Paul Heyman (Stanford): “ Tag Hierarchies ” http://i.stanford.edu/~heymann/taghierarchy.html Brooks, Montanez, University of San Francisco: “ Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf Siderean fac.etio.us: “ Faceted search on delicious tags ” http://www.siderean.com/delicious/facetious.jsp Marti Hearst: “Clustering vs. Faceted Search ” http://bailando.sims.berkeley.edu/papers/cacm06.pdf