Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Faceted search using Solr and Ontopia

5,736 views

Published on

A presentation about how Ontopia and Solr can be integrated.

Published in: Technology, Education

Faceted search using Solr and Ontopia

  1. 1. Faceted search using Solr and Ontopia<br />2009-11-03<br />Geir Ove Grønmo, grove@bouvet.no<br />
  2. 2. Agenda<br />Short introductions to Solr and Ontopia<br />What is faceted search?<br />An integration of the two – a prototype<br />Demos<br />
  3. 3. Apache Solr<br />A search engine<br />implemented as HTTP service on top of Apache Lucene<br />searching and indexing (no web-crawling)<br />adds support for faceted search (and more)<br />sharding and replication<br />distributed search<br />excellent interoperability (i.e not really Java-specific)<br />Next release: Solr 1.4<br />Open source:<br />http://lucene.apache.org/solr/<br />Apache Licence 2.0<br />
  4. 4. Ontopia<br />A Topic Maps toolkit:<br />data representation, persistence and querying<br />application development<br />written in Java<br />Next release: Ontopia 5.1<br />Open source:<br />http://code.google.com/p/ontopia/<br />Apache Licence 2.0<br />
  5. 5. Where the meat is...<br />Solr<br />fast textual search and faceted search support<br />Ontopia<br />rich semantic data and structured search<br />User interface design<br />providing a useful interface to the user<br />
  6. 6. But first, what is faceted search?<br />A technique for refining search results<br />Integrates textual search and navigation<br />Allows concept composition<br />slow + expensive + red + used + car<br />article + in english + about salmon<br />people + aged 20-30 + SQL expert<br />punk rock songs + &lt; 1 minute + in norwegian + released 1980-1982<br />Support exploration and learning<br />Never returns zero results<br />
  7. 7.
  8. 8. How is it done?<br />Given a starting set<br />usually all documents<br />or the result of filling in the search input box<br />...do the following:<br />count the number of hits matching each facet field<br />which fields to facet on are defined at query time<br />
  9. 9.
  10. 10.
  11. 11.
  12. 12. An example without faceted search<br />
  13. 13. Facet types<br />Standard facets<br />a list of facet values<br />Hierarchical facet values<br />taxonomy of facet values<br />Range/query facets<br />dates<br />prices<br />alphabet buckets<br />intervals (lower and upper bounds)<br />
  14. 14. Standard facets<br />
  15. 15. Hierarchical facet values<br />Note: the facets can also be hierarchical<br />
  16. 16. Alphabet buckets<br />
  17. 17. Range facets<br />
  18. 18. User interface considerations<br />Single select<br />link<br />radio button<br />Multi select<br />checkboxes<br />Decide on which operator to use: AND/OR<br />within a facet<br />between facets<br />How many facet values to display<br />given limited screen real estate<br />How to provide intuitive undo operation<br />
  19. 19. Examples<br />
  20. 20. Scoring<br />Some types of documents should be ranked higher than others<br />Solr lets one boost the default score:<br />per document<br />per field<br />The total score of a documents depends on:<br />the boost and score of the fields adjusted by how relevant a field is relatively to the actual query<br />the boost of the document<br />
  21. 21. Sorting<br />How to sort the list of facets?<br />by relevance<br />How to sort the values of each facet?<br />by number of hits<br />alphabetically<br />How to sort the search result?<br />by relevance<br />alphabetically<br />by date<br />
  22. 22. Proposition<br />“Concept composition, using faceted search, and Topic Maps is a perfect match”<br />
  23. 23. Why not use Ontopia only?<br />You can, but it is not optimizedfor this use case<br />It lets you implement faceted search<br />but it’ll be too slow<br />The reasons are:<br />all the expensive processing will have to happen at runtime, and not indexing time<br />involves a lot of traversal<br />relies on the underlying fulltext search engine<br />search has limited cacheability<br />
  24. 24. Trade-offs<br />Considerations:<br />Search performance<br />Indexing performance<br />Consistency<br />Ontopia<br />no indexing overhead<br />results always up-to-date<br />Solr<br />very fast search<br />indexing overhead<br />index must be kept up-to-date regularly<br />
  25. 25. Solr – the data model<br />An index contains documents<br />Documents have fields<br />A field can have multiple values<br />{ “id”: “1234”,<br /> “title”: “Structure and Interpretation of Computer Programs”,<br /> “authors”: [“Harold Abelson”, “Gerald Jay Sussman”] }<br />
  26. 26. Ontopia – the data model<br />A topic map contains<br />topics<br />and information about them<br />Identities<br />Names<br />Associations to other topics<br />Occurrences (read: non-association properties)<br />
  27. 27. Integrating Solr and Ontopia<br />Proposed solution:<br />Solr indexes constructed from Ontopia queries<br />For each document type create a query that extracts data from the topic map to fields in documents<br />Then do faceting on selected fields<br />Use-case specific schema definition<br />should be project specific (to some degree)<br />Perform full index or incremental reindex<br />
  28. 28. Index rule set<br />
  29. 29. Index rule: Organisasjonsenheter<br />
  30. 30. Query result: Organisasjonsenheter<br />
  31. 31. Solr index: Organisasjonsenhet<br />
  32. 32. Index rule: Artikler<br />
  33. 33. Query result: Artikler <br />
  34. 34. Solr index: Artikler<br />
  35. 35. Demo<br />A prototype for Bergen kommune<br />
  36. 36. Ideas for the future<br />Faceted search user-interface in Ontopoly<br />could be made declarative<br />Incremental reindexing<br />requires tracking changes<br />usually done with a timestamp<br />implement last-modified field in Ontopoly<br />Add optional fourth column for score boost?<br />a float between 0 and 1<br />Ontopia extensions for interacting with Solr<br />JSP tag library<br />tolog predicates <br />
  37. 37. More demos<br />Epicurious: recipe search<br />http://www.epicurious.com/tools/searchresults?search=<br />Flickr photo search with hierarchical facets<br />http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html<br />A collection of faceted navigation examples:<br />http://www.flickr.com/photos/morville/collections/72157603789246885/<br />
  38. 38. More information<br />3 Quick Design Patterns for Better Faceted Search<br />http://www.thingsontop.com/3-quick-patterns-better-facet-design-889.html<br />How to Make a Faceted Classification and Put It On the Web<br />http://www.miskatonic.org/library/facet-web-howto.html<br />Book: Faceted Search (Synthesis Lectures on Information Concepts, Retrieval, and Services), Daniel Tunkelang<br />
  39. 39. ...is easier to find when using faceted search.<br />Structured semantics-rich data...<br />

×