RawSugar Faceted Search

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    Human knowledge is essential for better search Today it is expansive and does not scale

    9 Favorites & 1 Group

    RawSugar Faceted Search - Presentation Transcript

    1. Faceted Search for Tagged Content Better Navigation for Your Web
    2. What’s Missing from Tag Search and Overall Search?
      • Text search is missing in exploration and navigation
        • Asthma, Digital Camera , biking, …
      • Human built taxonomies and classifications with faceted search do that but they do not scale
        • E.g., Amazon , shopping.com , Yahoo Shopping , Froogle
        • Very expensive to scale outside most valuable horizontals
      • Tagging might provide the answer but not as done today
        • Exploration and navigation in tagged accounts is poor
          • DailyKos – blog tag cloud is pretty but useless
          • juniorbonner – del.icio.us account
        • Overall navigation and search in tagged content is poor
          • Restaurants in del.icio.us
          • Technorati tag search
    3. RawSugar Solution Overview
      • RawSugar Key Technology:
        • Faceted search with hierarchical tags
        • Providing the best facets dynamically
        • Automatic and seamless merging of multiple users, experts and automatic hierarchies
      • User Benefits :
        • Improved navigation, guided search for tagged content
          • In individual user accounts and blogs
          • Overall
        • Embedded in user blogs and websites (dynamic, using AJAX)
        • Improved contextual ads for blog search
    4. RawSugar: Tag Search for Tagged Documents
        • Simple Tagging by User
          • Flat regular tagging for del.icio.us , Flickr , YouTube , Technorati , WordPress …
          • Enhance search with optional tag hierarchy – Light structure
          • Add tag search component to blog/website
        • RawSugar Algorithms:
          • Crawls tags and content via syndication feeds
          • Applies algorithmic tag hierarchies
          • Provides tag-based faceted search
    5. RawSugar Technology In Action
      • Single Collections
        • Del.icio.us: Juniorbonner delicious or RawSugar
        • Blogs: Philipp Keller , engadget , techcrunch , 3pots , Sprol , etc.
      • RawSugar Multiple Sources:
        • Web2.0 (search of multiple accounts)
      • Searches:
        • Bush , Health , etc.
      • Watch list:
        • Gadgets blogs – gadgetsguy
        • Sports blogs - sportsguy
        • Politics blogs - modernpolitics
    6. Key technology points
      • Hierarchical tagging:
        • Users can define local hierarchies.
      • Faceted search for tagged content
        • Providing good facets for tags is hard
        • Algorithmic discovery of facets
      • Merging Expert, User and Automatic classifications
        • Multiple level hierarchies
    7. Hierarchical Tagging
      • Expert Users can specify soft tag-relationship
        • This creates a forest not a tree
        • Ambiguity is encouraged
        • Many local hierarchies nothing Global
      • Necessary for Faceted Search:
        • Tags act as both facets and values
        • A subtag acts as a value
      User defined forest Search time tag Groupings.
    8. Faceted Search for Tagged Content
      • Providing good facets for tags is hard
      • Unlike database categories tag-space is messy:
        • Noisy
          • search, search engine, google, seach_engines, searchengine, searchengines, search_engine, engine, web, internet, tools, reference, searchengines, news, information, portal, engines, searching, test, tech, buscadores, tool, etc.
        • Large : one new tag per 10 entries. Compare to at most thousand categories supported by other faceted search systems
        • Dynamic: and dependent on search context and user
    9. Faceted Search for Tagged Content Refine your search Food groups Locations groups Origins groups
    10. Automatic Discovery of Facets
      • Providing the best Tag Hierarchies
      • Core concepts:
        • Only some users (4%) define tag hierarchies; for example, food>sushi, european>spanish, and so on
        • Analysis of tag co-occurrence patterns
        • Analysis of search patterns
      • We mine this tag space to learn simple tag relations ( IS-A relations and RELATED ) using statistics.
      • At search time we apply this learned knowledge to group tags from results
    11. Discovered Tag Hierarchies
      • Health : acne, aging, alternative, beautiful, body, brain, breast, breastcancer, cancer, care, cause, depression, diabetes, diet, disease, exercise, fats, fitness, healthcare, heart, herbs, insurance, life, loss, medical, meditation, mental, mind, nutrition, obesity, planning, pregnancy, prevention, protective, quotes, risk, running, sleep, smoke, stress, supplements, terms, treatments, vitamins, weight, wellness, women, workout, yoga
      • Fitness : body, nutrition, running, walking, workout, yoga
      • Fun : adults, birthday, characters, contests, crazy, dance, jokes, optical, pics, plays, puzzles, quizes, sexy, silly,sudoku, toy, trivia, comedy, commercial, crazy, cute, humour, jokes, parody, pics, satire, silly, simpson, snl, spoof, strange, stupid, t-shirts, wtf
    12. More Complex Discovered Tag Hierarchies
      • internet > search engines > google >:
        • ads, adsense, adwords, affiliate, analysis, analytics, api, base, censorship, chat, checkout, cookies, craigslist, e-mail, earth, geo, geotagging, gmail, gmaps, goog, google adsense, google calendar, google earth, google maps, google news, google video, google: adwords, googlebase, googleearth, googlemaps, googletalk, googlevideo, im, instant messaging, map, modules, msn, pagerank, payment, paypal, ping, players, ranking, reader, referrals, revenue, searchengines, selling, seo, sitemap, sms, stats, sync, talk, traffic, web services
      • internet > search engines > yahoo >
        • answers, buzz, home & living, information management, msn, personal finance, sitemap
      • internet > web technologies > ajax >
        • 2.0, aggregation, atlas, calendars, chat, debug, dom, domain, examples, flex, forms, framework, groupware, homepage, im, jsp, libraries, messenger, patterns, portals, prototype, ria, rubyonrails, slideshow, toolkits, web services, webapps, webdev, webmail, whois, wysiwyg, xmlhttprequest, xpath
      • internet > web technologies > css >
        • accessibility, borders, boxing, bugs, cheatsheet, columns, dev, dom, dreamweaver, examples, floats, forms, gallery, ie, inspire, layout, markup, menu, navigation, optimizers, positioning, showcase, slideshow, standard, tables, tabs, template, tricks, typography, usability, w3c, web dev, webdesign, webdev, webmaster, webstandards,
    13. Merging Expert, User and Automatic Hierarchies
      • Hierarchies can come from multiple sources:
      • Experts or External reliable sources ( e.g., Dmoz, Wordnet, etc.)
      • Users of various level of expertise
      • Automatically discovered hierarchies
      • Since hierarchies are local, imperfect we need to deal with:
      • Conflicts
      • Missing levels
      • Ambiguities
    14. Merging Expert, User and Automatic Hierarchies europe UK Scotland Edinburgh Spain Italy food vegetarian Sushi food cooking recipes Asian Chinese Thai Southwest California Bay Area San Francisco Texas Expert 1 User 2 User 3 Automatic 22 Automatic 5
    15. Some other Technology points
      • Multi word tags : social search, socialsearch, social_search, etc. We automatically learn these and treat them as equal. Also includes morphology.
      • Synonyms : Based on clustering techniques we discover similar tags. For example: “ programming, development, software.” Uses
        • “ See also”
        • Query expansion to increase recall
      • Relevance scoring: based on popularity, freshness, standard relevance
      • Tag Auto-Completion : While searching or tagging -
    16. Future Plans
      • Disambiguation of hierarchies: Java, Salsa, Free, Italian
      • Use of large expert taxonomies from other sources:
          • Locations
          • Wikipedia, Dmoz, WordNet, Google Co-op, or other directories
          • More topic domains: Health
      • Sharing of tag hierarchies across users
      • Improve usability of tag hierarchy
      • Mashups
        • Maps
        • Reviews
        • User content and classification
    17. Discussion
      • Questions?
    18. Backup slides
    19. Why Faceted Search? Improved Exploration - morphology Locations Restaurant Type Not a restaurant!
    20. Why Faceted Search? Improved Exploration - Not usable !
    21. References Rashmi Sinha: “ Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html Emanuele Quintarelli: “ Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91 Paul Heyman (Stanford): “ Tag Hierarchies ” http://i.stanford.edu/~heymann/taghierarchy.html Brooks, Montanez, University of San Francisco: “ Improved Annotation of the Blogopshere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf Siderean fac.etio.us: “ Faceted search on delicious tags ” http://www.siderean.com/delicious/facetious.jsp Marti Hearst: “Clustering vs. Faceted Search ” http://bailando.sims.berkeley.edu/papers/cacm06.pdf

    + Frank SmadjaFrank Smadja, 4 years ago

    custom

    4682 views, 9 favs, 0 embeds more stats

    Talk given at the SIGIR 2006, workshop on faceted s more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 4682
      • 4682 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 9
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events