Faceted Navigation (LACASIS Fall Workshop 2005)


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Faceted Navigation (LACASIS Fall Workshop 2005)

  1. 1. Faceted Navigation Presentation to LACASIS 2005 Fall Workshop Search Forward: Emerging Internet Capabilities November 18 th 2005 Brad Allen, Founder and CTO Siderean Software, Inc.
  2. 2. Overview <ul><li>Problem: Knowing what information is available </li></ul><ul><li>Solution: Faceted navigation </li></ul><ul><ul><li>How is navigation different than search </li></ul></ul><ul><ul><li>Case studies and business applications </li></ul></ul><ul><li>Lessons learned </li></ul><ul><li>Challenges </li></ul><ul><li>Demonstration </li></ul><ul><li>Discussion </li></ul>
  3. 3. Problem: Knowing what information is available
  4. 4. Faceted navigation: providing “a bird’s eye view” of available information vs.
  5. 5. How faceted navigation differs from search <ul><li>Faceted navigation is a new type of software application </li></ul><ul><li>It goes beyond search and browsing by providing: </li></ul><ul><ul><li>Scope: an overview of all available information </li></ul></ul><ul><ul><li>Context: provide a frame of reference to orient oneself in a dynamic information space </li></ul></ul><ul><ul><li>Repeatability: using scope and context as cues to lead users back to relevant information </li></ul></ul><ul><ul><li>Universality: a unified means of accessing information that is independent of type or source </li></ul></ul><ul><li>Faceted navigation provides the insight of analytics with the ease of search </li></ul>
  6. 6. Faceted navigation: origins <ul><li>Library science </li></ul><ul><ul><li>Raganathan and the invention of faceted classification </li></ul></ul><ul><ul><li>Digital library efforts </li></ul></ul><ul><li>Information retrieval </li></ul><ul><ul><li>Parametric search </li></ul></ul><ul><ul><li>Query by example </li></ul></ul><ul><ul><li>Retrieval by reformulation </li></ul></ul><ul><ul><ul><li>Rabbit, Argon </li></ul></ul></ul><ul><li>Systems have been moving from academic prototypes into commercial use over the last four years </li></ul><ul><ul><li>Marti Hearst as a pioneer in this area </li></ul></ul><ul><ul><li>Siderean, Endeca, Vivisimo, FAST driving technology into enterprises </li></ul></ul>
  7. 7. Facets: the basis of navigation <ul><li>Facets are metadata properties whose ranges form a near-orthogonal set of controlled vocabularies </li></ul><ul><ul><li>Creator: Dickens, Charles </li></ul></ul><ul><ul><li>Subject: Arsenic, Antimony </li></ul></ul><ul><ul><li>Location: World > U.S. > California > Venice </li></ul></ul><ul><li>Facets form a frame of reference for information overview, access and discovery </li></ul><ul><ul><li>Other properties serve as landmarks and cues </li></ul></ul>
  8. 8. Building navigation applications Organized into a unified information architecture… Analyzed to generate faceted views… Providing faceted navigation across the data and content Metadata about data and content is aggregated… Term Event Person Place Text View View
  9. 9. Case study: NASA JPL <ul><li>Delivery to implementation in weeks using 3 internal resources </li></ul><ul><li>Brings together SharePoint, DocuShare, and structured trouble ticketing databases </li></ul><ul><li>Provides uniform access to all relevant information about previous projects in one place </li></ul><ul><li>Incorporates corporate vocabulary for concept-based search </li></ul><ul><li>Allows user community to contribute to organization of information </li></ul>
  10. 10. Metadata in today’s enterprises <ul><li>From thirty interviews conducted with Fortune 1000 organizations during Fall 2004 </li></ul><ul><ul><li>Use of metadata not yet widespread but emerging </li></ul></ul><ul><ul><li>Understanding varies widely across enterprises </li></ul></ul><ul><ul><li>Three basic approaches: </li></ul></ul><ul><ul><ul><li>Top down </li></ul></ul></ul><ul><ul><ul><ul><li>CEO says “We must be an information-driven company” </li></ul></ul></ul></ul><ul><ul><ul><ul><li>“ Corporate controlled vocabulary that all divisions will use” </li></ul></ul></ul></ul><ul><ul><ul><ul><li>The effort is multi-year, ROI hard to track, and may not be implemented or adopted widely </li></ul></ul></ul></ul><ul><ul><ul><li>Bottom up </li></ul></ul></ul><ul><ul><ul><ul><li>Groups determine their vocabulary while describing their process </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Light tagging of content when it is created or when the content is published to a portal </li></ul></ul></ul></ul><ul><ul><ul><li>Give up </li></ul></ul></ul><ul><ul><ul><ul><li>Assumption: too difficult to create metadata from existing content </li></ul></ul></ul></ul><ul><ul><ul><ul><li>But still feel that metadata would improve matters, particularly within business units </li></ul></ul></ul></ul>
  11. 11. Verticals for faceted navigation Traders, industry analysts, investment bankers Adoption of RSS for market news emerging News feeds, financial DBs, market data Search, analyze and monitor dynamic financial and market data Financial Services Consumers, marketers Pervasive adoption of XML standards for moving product and customer data across value chains Product catalogs, customer reviews, customer service data, advertising Search and browse catalogs of products and services, consumer-generated information E-Commerce Intelligence analysts Commitment to RDF/OWL as solution for cross-agency interoperability, actively using RSS Scads of all types, with unstructured information often preprocessed to boot Search, analyze and monitor complex, dynamic intelligence, project and problem information across organizations and projects (Columbia, Iraq, 9/11) Federal Government Business users Adopting semantic technologies Existing metadata Strong identified application fit Vertical
  12. 12. Navigation requires metadata <ul><li>Ontologies </li></ul><ul><ul><li>Specifications of how to represent classes, instances and their properties </li></ul></ul><ul><ul><li>Sometimes called “vocabularies” </li></ul></ul><ul><li>Controlled vocabularies </li></ul><ul><ul><li>Terms for saying what something is about </li></ul></ul><ul><ul><li>Also called “taxonomies” and “thesauri” </li></ul></ul><ul><li>Instances </li></ul><ul><ul><li>Descriptions of resources </li></ul></ul><ul><li>Application profiles </li></ul><ul><ul><li>Specifications of which classes and properties are useful and how they are to be used in an application </li></ul></ul>
  13. 13. Lessons learned <ul><li>Balanced incremental approach </li></ul><ul><li>Leverage metadata and indices at hand </li></ul><ul><li>Exploit statistics where desirable </li></ul><ul><ul><li>But layer a framework on top to structure the statistics </li></ul></ul><ul><li>Significant mileage from very simple frameworks </li></ul>
  14. 14. The utility of RDF for commercial metadata <ul><li>RDF can make metadata use easier and less costly </li></ul><ul><ul><li>An open standard for metadata reduces cost and avoids technology and vendor lock-in </li></ul></ul><ul><ul><li>A “universal solvent” for data and content </li></ul></ul><ul><ul><li>A platform for reuse and sharing </li></ul></ul>
  15. 15. Building navigation systems with RDF <ul><li>Define/reuse ontologies expressed in RDF(S) </li></ul><ul><ul><li>Classes for defining instances and controlled vocabularies </li></ul></ul><ul><ul><li>Properties for facets and additional attributes </li></ul></ul><ul><li>Import/transform instances into an RDF representation </li></ul><ul><ul><li>Resources referred to via URIs </li></ul></ul><ul><ul><li>Content and controlled vocabularies </li></ul></ul><ul><li>Write application profiles in terms of RDF </li></ul>
  16. 16. Lessons: ontologies <ul><li>Don’t do: assume you have to build elaborate OWL ontologies </li></ul><ul><ul><li>Don’t have to boil the ocean to get the benefits </li></ul></ul><ul><ul><li>OWL DL, OWL Full are overkill for this class of application </li></ul></ul><ul><li>Side issue: description logic for navigation is not addressed adequately by OWL </li></ul><ul><ul><li>Class/subclass versus arbitrary hierarchical relations </li></ul></ul><ul><li>Do: Tiny Ontologies All Stitched Together (TOAST) </li></ul><ul><ul><li>RDF Schema with a smattering of RDF/OWL properties (e.g., owl:inverse) </li></ul></ul><ul><ul><li>Start with DC + SKOS + FOAF </li></ul></ul>
  17. 17. Lessons: controlled vocabularies <ul><li>Don’t do: huge monolithic taxonomies </li></ul><ul><ul><li>Unless they are ready at hand and can be reused largely without modification </li></ul></ul><ul><li>Do: bite-sized controlled vocabularies that exploit faceted approaches </li></ul><ul><ul><li>4 facets x 10 terms per facet versus 10 4 terms in a single taxonomy </li></ul></ul><ul><ul><li>Start with flat term lists </li></ul></ul><ul><ul><li>Add BT/NT/RT relationships over time </li></ul></ul>
  18. 18. Lessons: instances <ul><li>Manual creation </li></ul><ul><ul><li>Don’t do: exhaustive author creation of metadata </li></ul></ul><ul><ul><li>Do: community annotation and tagging </li></ul></ul><ul><li>(Semi-)automated creation </li></ul><ul><ul><li>Don’t do: assume elaborate information extraction based on NLP, subject tagging and categorization </li></ul></ul><ul><ul><li>Do: quick and dirty named entity extraction, or better yet, stick to readily available asset and relational metadata (date, creator, document type/genre) </li></ul></ul><ul><ul><ul><li>Much of the benefit at a fraction of the effort </li></ul></ul></ul>
  19. 19. Lessons: application profiles <ul><li>Metadata is increasingly pervasive </li></ul><ul><ul><li>The way to leverage existing information infrastructure </li></ul></ul><ul><li>Exploit “on-demand” information integration feature of RDF </li></ul><ul><li>DB + XML XLST RDF(S): a simple, sloppy framework </li></ul><ul><ul><li>Part of Adam Bosworth’s “Web of data” </li></ul></ul>
  20. 20. The big question: statistics vs. knowledge <ul><li>Statistics can’t deliver everything </li></ul><ul><ul><li>Alan Kay’s puppy analogy </li></ul></ul><ul><ul><li>Vitanyi work on “Google learning” </li></ul></ul><ul><li>On the other hand, knowledge is dearly won </li></ul><ul><ul><li>CYC </li></ul></ul><ul><li>Need a balance that enables adoption without losing the benefits </li></ul><ul><li>Lessons from </li></ul><ul><ul><li>Statistics vs. knowledge in NLP </li></ul></ul><ul><ul><li>Expert systems </li></ul></ul>
  21. 21. Future directions <ul><li>User tagging + RDF: the killer SW application? </li></ul><ul><ul><li>The rehabilitation of metadata in the social software community </li></ul></ul><ul><ul><li>The re-emergence of RSS 1.0 </li></ul></ul><ul><ul><li>“ Folksonomy”-driven social search </li></ul></ul><ul><ul><ul><li>Del.icio.us, Flickr, CiteULike </li></ul></ul></ul><ul><li>Towards social navigation: fac.etio.us </li></ul>
  22. 22. fac.etio.us <ul><li>Aggregated feeds from del.icio.us social bookmarking site </li></ul><ul><ul><li>10 5 Web pages </li></ul></ul><ul><ul><li>10 4 tags </li></ul></ul><ul><ul><li>10 4 contributors </li></ul></ul><ul><ul><li>10 4 orginating sites </li></ul></ul><ul><li>Superior user experience with 10 minutes’ effort </li></ul><ul><ul><li>“ In 3 clicks, I drilled down through 9700+ sites, to a more specific set of 98 things, down to one I found useful.” </li></ul></ul><ul><li>Tagging the tags to add semantics </li></ul><ul><ul><li>Bootstrapping folksonomies into taxonomies without impacting user creation of metadata </li></ul></ul><ul><ul><li>Merging anarchy with governance </li></ul></ul>
  23. 23. Challenges <ul><li>Scale </li></ul><ul><ul><li>Must be commensurate with expectations and requirements from traditional web and enterprise search </li></ul></ul><ul><li>Algorithms </li></ul><ul><ul><li>Many alternatives still being explored </li></ul></ul><ul><li>Usability </li></ul><ul><ul><li>Lots of work to be done to validate benefits </li></ul></ul><ul><li>Security, trust and provenance </li></ul><ul><ul><li>Just beginning to understand </li></ul></ul>
  24. 24. Challenges: scale <ul><li>Navigation has to live up to the scaling expectations set by search, while it is doing a lot more work </li></ul><ul><ul><li>Number of objects, feeds: 10 6 to 10 9 </li></ul></ul><ul><ul><li>Ingest rates: ~ 10 3 – 10 4 triples/sec, how many per resource? </li></ul></ul><ul><ul><li>Latency: < 0.5 sec user time regardless of application </li></ul></ul><ul><li>Implementations exploit RAM to deliver low latency, but this is an impediment to terabyte-scale bodies of information </li></ul>
  25. 25. Challenges: algorithms <ul><li>Federated services vs. centralized servers </li></ul><ul><li>Relationship to relevance ranking </li></ul><ul><li>Support for aggregate and text search operators in RDF query </li></ul><ul><li>Integration of multimedia retrieval algorithms as equal citizens to free text retrieval </li></ul>
  26. 26. Challenges: usability <ul><li>Navigation interfaces in their infancy </li></ul><ul><li>Tagging interfaces even more so </li></ul><ul><li>Principled analyses of precision and recall have yet to be done </li></ul><ul><li>Visualization beyond “sticks and ovals” is begging to be integrated </li></ul><ul><ul><li>Navigate to a small result set, then visualize </li></ul></ul>
  27. 27. Summary <ul><li>Faceted navigation is a new software product category that addresses the pain associated today with finding and discovering actionable information </li></ul><ul><li>The use of Semantic Web standards, principally RDF, enables the development of faceted navigation applications </li></ul><ul><li>It is “early days” for faceted navigation applications and challenges remain, but we believe the potential is significant </li></ul>
  28. 28. Siderean Software, Inc. 390 North Sepulveda Blvd., Suite 2070 El Segundo, CA 90245-4475 USA +1 310 647-4266 http://www.siderean.com [email_address]