Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RawSugar Faceted Search


Published on

Talk given at the SIGIR 2006, workshop on faceted search

Published in: Technology, News & Politics
  • Best one
    Hope you are in good health. My name is AMANDA . I am a single girl, Am looking for reliable and honest person. please have a little time for me. Please reach me back so that i can explain all about myself .
    Best regards AMANDA.
    Are you sure you want to  Yes  No
    Your message goes here

RawSugar Faceted Search

  1. 1. Faceted Search for Tagged Content Better Navigation for Your Web
  2. 2. What’s Missing from Tag Search and Overall Search? <ul><li>Text search is missing in exploration and navigation </li></ul><ul><ul><li>Asthma, Digital Camera , biking, … </li></ul></ul><ul><li>Human built taxonomies and classifications with faceted search do that but they do not scale </li></ul><ul><ul><li>E.g., Amazon , , Yahoo Shopping , Froogle </li></ul></ul><ul><ul><li>Very expensive to scale outside most valuable horizontals </li></ul></ul><ul><li>Tagging might provide the answer but not as done today </li></ul><ul><ul><li>Exploration and navigation in tagged accounts is poor </li></ul></ul><ul><ul><ul><li>DailyKos – blog tag cloud is pretty but useless </li></ul></ul></ul><ul><ul><ul><li>juniorbonner – account </li></ul></ul></ul><ul><ul><li>Overall navigation and search in tagged content is poor </li></ul></ul><ul><ul><ul><li>Restaurants in </li></ul></ul></ul><ul><ul><ul><li>Technorati tag search </li></ul></ul></ul>
  3. 3. RawSugar Solution Overview <ul><li>RawSugar Key Technology: </li></ul><ul><ul><li>Faceted search with hierarchical tags </li></ul></ul><ul><ul><li>Providing the best facets dynamically </li></ul></ul><ul><ul><li>Automatic and seamless merging of multiple users, experts and automatic hierarchies </li></ul></ul><ul><li>User Benefits : </li></ul><ul><ul><li>Improved navigation, guided search for tagged content </li></ul></ul><ul><ul><ul><li>In individual user accounts and blogs </li></ul></ul></ul><ul><ul><ul><li>Overall </li></ul></ul></ul><ul><ul><li>Embedded in user blogs and websites (dynamic, using AJAX) </li></ul></ul><ul><ul><li>Improved contextual ads for blog search </li></ul></ul>
  4. 4. RawSugar: Tag Search for Tagged Documents <ul><ul><li>Simple Tagging by User </li></ul></ul><ul><ul><ul><li>Flat regular tagging for , Flickr , YouTube , Technorati , WordPress … </li></ul></ul></ul><ul><ul><ul><li>Enhance search with optional tag hierarchy – Light structure </li></ul></ul></ul><ul><ul><ul><li>Add tag search component to blog/website </li></ul></ul></ul><ul><ul><li>RawSugar Algorithms: </li></ul></ul><ul><ul><ul><li>Crawls tags and content via syndication feeds </li></ul></ul></ul><ul><ul><ul><li>Applies algorithmic tag hierarchies </li></ul></ul></ul><ul><ul><ul><li>Provides tag-based faceted search </li></ul></ul></ul>
  5. 5. RawSugar Technology In Action <ul><li>Single Collections </li></ul><ul><ul><li> Juniorbonner delicious or RawSugar </li></ul></ul><ul><ul><li>Blogs: Philipp Keller , engadget , techcrunch , 3pots , Sprol , etc. </li></ul></ul><ul><li>RawSugar Multiple Sources: </li></ul><ul><ul><li>Web2.0 (search of multiple accounts) </li></ul></ul><ul><li>Searches: </li></ul><ul><ul><li>Bush , Health , etc. </li></ul></ul><ul><li>Watch list: </li></ul><ul><ul><li>Gadgets blogs – gadgetsguy </li></ul></ul><ul><ul><li>Sports blogs - sportsguy </li></ul></ul><ul><ul><li>Politics blogs - modernpolitics </li></ul></ul>
  6. 6. Key technology points <ul><li>Hierarchical tagging: </li></ul><ul><ul><li>Users can define local hierarchies. </li></ul></ul><ul><li>Faceted search for tagged content </li></ul><ul><ul><li>Providing good facets for tags is hard </li></ul></ul><ul><ul><li>Algorithmic discovery of facets </li></ul></ul><ul><li>Merging Expert, User and Automatic classifications </li></ul><ul><ul><li>Multiple level hierarchies </li></ul></ul>
  7. 7. Hierarchical Tagging <ul><li>Expert Users can specify soft tag-relationship </li></ul><ul><ul><li>This creates a forest not a tree </li></ul></ul><ul><ul><li>Ambiguity is encouraged </li></ul></ul><ul><ul><li>Many local hierarchies nothing Global </li></ul></ul><ul><li>Necessary for Faceted Search: </li></ul><ul><ul><li>Tags act as both facets and values </li></ul></ul><ul><ul><li>A subtag acts as a value </li></ul></ul>User defined forest Search time tag Groupings.
  8. 8. Faceted Search for Tagged Content <ul><li>Providing good facets for tags is hard </li></ul><ul><li>Unlike database categories tag-space is messy: </li></ul><ul><ul><li>Noisy </li></ul></ul><ul><ul><ul><li>search, search engine, google, seach_engines, searchengine, searchengines, search_engine, engine, web, internet, tools, reference, searchengines, news, information, portal, engines, searching, test, tech, buscadores, tool, etc. </li></ul></ul></ul><ul><ul><li>Large : one new tag per 10 entries. Compare to at most thousand categories supported by other faceted search systems </li></ul></ul><ul><ul><li>Dynamic: and dependent on search context and user </li></ul></ul>
  9. 9. Faceted Search for Tagged Content Refine your search Food groups Locations groups Origins groups
  10. 10. Automatic Discovery of Facets <ul><li>Providing the best Tag Hierarchies </li></ul><ul><li>Core concepts: </li></ul><ul><ul><li>Only some users (4%) define tag hierarchies; for example, food>sushi, european>spanish, and so on </li></ul></ul><ul><ul><li>Analysis of tag co-occurrence patterns </li></ul></ul><ul><ul><li>Analysis of search patterns </li></ul></ul><ul><li>We mine this tag space to learn simple tag relations ( IS-A relations and RELATED ) using statistics. </li></ul><ul><li>At search time we apply this learned knowledge to group tags from results </li></ul>
  11. 11. Discovered Tag Hierarchies <ul><li>Health : acne, aging, alternative, beautiful, body, brain, breast, breastcancer, cancer, care, cause, depression, diabetes, diet, disease, exercise, fats, fitness, healthcare, heart, herbs, insurance, life, loss, medical, meditation, mental, mind, nutrition, obesity, planning, pregnancy, prevention, protective, quotes, risk, running, sleep, smoke, stress, supplements, terms, treatments, vitamins, weight, wellness, women, workout, yoga </li></ul><ul><li>Fitness : body, nutrition, running, walking, workout, yoga </li></ul><ul><li>Fun : adults, birthday, characters, contests, crazy, dance, jokes, optical, pics, plays, puzzles, quizes, sexy, silly,sudoku, toy, trivia, comedy, commercial, crazy, cute, humour, jokes, parody, pics, satire, silly, simpson, snl, spoof, strange, stupid, t-shirts, wtf </li></ul>
  12. 12. More Complex Discovered Tag Hierarchies <ul><li>internet > search engines > google >: </li></ul><ul><ul><li>ads, adsense, adwords, affiliate, analysis, analytics, api, base, censorship, chat, checkout, cookies, craigslist, e-mail, earth, geo, geotagging, gmail, gmaps, goog, google adsense, google calendar, google earth, google maps, google news, google video, google: adwords, googlebase, googleearth, googlemaps, googletalk, googlevideo, im, instant messaging, map, modules, msn, pagerank, payment, paypal, ping, players, ranking, reader, referrals, revenue, searchengines, selling, seo, sitemap, sms, stats, sync, talk, traffic, web services </li></ul></ul><ul><li>internet > search engines > yahoo > </li></ul><ul><ul><li>answers, buzz, home & living, information management, msn, personal finance, sitemap </li></ul></ul><ul><li>internet > web technologies > ajax > </li></ul><ul><ul><li>2.0, aggregation, atlas, calendars, chat, debug, dom, domain, examples, flex, forms, framework, groupware, homepage, im, jsp, libraries, messenger, patterns, portals, prototype, ria, rubyonrails, slideshow, toolkits, web services, webapps, webdev, webmail, whois, wysiwyg, xmlhttprequest, xpath </li></ul></ul><ul><li>internet > web technologies > css > </li></ul><ul><ul><li>accessibility, borders, boxing, bugs, cheatsheet, columns, dev, dom, dreamweaver, examples, floats, forms, gallery, ie, inspire, layout, markup, menu, navigation, optimizers, positioning, showcase, slideshow, standard, tables, tabs, template, tricks, typography, usability, w3c, web dev, webdesign, webdev, webmaster, webstandards, </li></ul></ul>
  13. 13. Merging Expert, User and Automatic Hierarchies <ul><li>Hierarchies can come from multiple sources: </li></ul><ul><li>Experts or External reliable sources ( e.g., Dmoz, Wordnet, etc.) </li></ul><ul><li>Users of various level of expertise </li></ul><ul><li>Automatically discovered hierarchies </li></ul><ul><li>Since hierarchies are local, imperfect we need to deal with: </li></ul><ul><li>Conflicts </li></ul><ul><li>Missing levels </li></ul><ul><li>Ambiguities </li></ul>
  14. 14. Merging Expert, User and Automatic Hierarchies europe UK Scotland Edinburgh Spain Italy food vegetarian Sushi food cooking recipes Asian Chinese Thai Southwest California Bay Area San Francisco Texas Expert 1 User 2 User 3 Automatic 22 Automatic 5
  15. 15. Some other Technology points <ul><li>Multi word tags : social search, socialsearch, social_search, etc. We automatically learn these and treat them as equal. Also includes morphology. </li></ul><ul><li>Synonyms : Based on clustering techniques we discover similar tags. For example: “ programming, development, software.” Uses </li></ul><ul><ul><li>“ See also” </li></ul></ul><ul><ul><li>Query expansion to increase recall </li></ul></ul><ul><li>Relevance scoring: based on popularity, freshness, standard relevance </li></ul><ul><li>Tag Auto-Completion : While searching or tagging - </li></ul>
  16. 16. Future Plans <ul><li>Disambiguation of hierarchies: Java, Salsa, Free, Italian </li></ul><ul><li>Use of large expert taxonomies from other sources: </li></ul><ul><ul><ul><li>Locations </li></ul></ul></ul><ul><ul><ul><li>Wikipedia, Dmoz, WordNet, Google Co-op, or other directories </li></ul></ul></ul><ul><ul><ul><li>More topic domains: Health </li></ul></ul></ul><ul><li>Sharing of tag hierarchies across users </li></ul><ul><li>Improve usability of tag hierarchy </li></ul><ul><li>Mashups </li></ul><ul><ul><li>Maps </li></ul></ul><ul><ul><li>Reviews </li></ul></ul><ul><ul><li>User content and classification </li></ul></ul>
  17. 17. Discussion <ul><li>Questions? </li></ul>
  18. 18. Backup slides
  19. 19. Why Faceted Search? Improved Exploration - morphology Locations Restaurant Type Not a restaurant!
  20. 20. Why Faceted Search? Improved Exploration - Not usable !
  21. 21. References Rashmi Sinha: “ Tag Sorting: Another tool in an information architect's toolbox” Emanuele Quintarelli: “ Hierarchical taxonomies from flat tag spaces” Paul Heyman (Stanford): “ Tag Hierarchies ” Brooks, Montanez, University of San Francisco: “ Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering ” Siderean “ Faceted search on delicious tags ” Marti Hearst: “Clustering vs. Faceted Search ”