A presentation about how Ontopia and Solr can be integrated.

  1. 1. Faceted search using Solr and Ontopia<br />2009-11-03<br />Geir Ove Grønmo, grove@bouvet.no<br />
  2. 2. Agenda<br />Short introductions to Solr and Ontopia<br />What is faceted search?<br />An integration of the two – a prototype<br />Demos<br />
  3. 3. Apache Solr<br />A search engine<br />implemented as HTTP service on top of Apache Lucene<br />searching and indexing (no web-crawling)<br />adds support for faceted search (and more)<br />sharding and replication<br />distributed search<br />excellent interoperability (i.e not really Java-specific)<br />Next release: Solr 1.4<br />Open source:<br />http://lucene.apache.org/solr/<br />Apache Licence 2.0<br />
  4. 4. Ontopia<br />A Topic Maps toolkit:<br />data representation, persistence and querying<br />application development<br />written in Java<br />Next release: Ontopia 5.1<br />Open source:<br />http://code.google.com/p/ontopia/<br />Apache Licence 2.0<br />
  5. 5. Where the meat is...<br />Solr<br />fast textual search and faceted search support<br />Ontopia<br />rich semantic data and structured search<br />User interface design<br />providing a useful interface to the user<br />
  6. 6. But first, what is faceted search?<br />A technique for refining search results<br />Integrates textual search and navigation<br />Allows concept composition<br />slow + expensive + red + used + car<br />article + in english + about salmon<br />people + aged 20-30 + SQL expert<br />punk rock songs + &lt; 1 minute + in norwegian + released 1980-1982<br />Support exploration and learning<br />Never returns zero results<br />
  8. 8. How is it done?<br />Given a starting set<br />usually all documents<br />or the result of filling in the search input box<br />...do the following:<br />count the number of hits matching each facet field<br />which fields to facet on are defined at query time<br />
  12. 12. An example without faceted search<br />
  13. 13. Facet types<br />Standard facets<br />a list of facet values<br />Hierarchical facet values<br />taxonomy of facet values<br />Range/query facets<br />dates<br />prices<br />alphabet buckets<br />intervals (lower and upper bounds)<br />
  14. 14. Standard facets<br />
  15. 15. Hierarchical facet values<br />Note: the facets can also be hierarchical<br />
  16. 16. Alphabet buckets<br />
  17. 17. Range facets<br />
  18. 18. User interface considerations<br />Single select<br />link<br />radio button<br />Multi select<br />checkboxes<br />Decide on which operator to use: AND/OR<br />within a facet<br />between facets<br />How many facet values to display<br />given limited screen real estate<br />How to provide intuitive undo operation<br />
  19. 19. Examples<br />
  20. 20. Scoring<br />Some types of documents should be ranked higher than others<br />Solr lets one boost the default score:<br />per document<br />per field<br />The total score of a documents depends on:<br />the boost and score of the fields adjusted by how relevant a field is relatively to the actual query<br />the boost of the document<br />
  21. 21. Sorting<br />How to sort the list of facets?<br />by relevance<br />How to sort the values of each facet?<br />by number of hits<br />alphabetically<br />How to sort the search result?<br />by relevance<br />alphabetically<br />by date<br />
  22. 22. Proposition<br />“Concept composition, using faceted search, and Topic Maps is a perfect match”<br />
  23. 23. Why not use Ontopia only?<br />You can, but it is not optimizedfor this use case<br />It lets you implement faceted search<br />but it’ll be too slow<br />The reasons are:<br />all the expensive processing will have to happen at runtime, and not indexing time<br />involves a lot of traversal<br />relies on the underlying fulltext search engine<br />search has limited cacheability<br />
  24. 24. Trade-offs<br />Considerations:<br />Search performance<br />Indexing performance<br />Consistency<br />Ontopia<br />no indexing overhead<br />results always up-to-date<br />Solr<br />very fast search<br />indexing overhead<br />index must be kept up-to-date regularly<br />
  25. 25. Solr – the data model<br />An index contains documents<br />Documents have fields<br />A field can have multiple values<br />{ “id”: “1234”,<br /> “title”: “Structure and Interpretation of Computer Programs”,<br /> “authors”: [“Harold Abelson”, “Gerald Jay Sussman”] }<br />
  26. 26. Ontopia – the data model<br />A topic map contains<br />topics<br />and information about them<br />Identities<br />Names<br />Associations to other topics<br />Occurrences (read: non-association properties)<br />
  27. 27. Integrating Solr and Ontopia<br />Proposed solution:<br />Solr indexes constructed from Ontopia queries<br />For each document type create a query that extracts data from the topic map to fields in documents<br />Then do faceting on selected fields<br />Use-case specific schema definition<br />should be project specific (to some degree)<br />Perform full index or incremental reindex<br />
  28. 28. Index rule set<br />
  29. 29. Index rule: Organisasjonsenheter<br />
  30. 30. Query result: Organisasjonsenheter<br />
  31. 31. Solr index: Organisasjonsenhet<br />
  32. 32. Index rule: Artikler<br />
  33. 33. Query result: Artikler <br />
  34. 34. Solr index: Artikler<br />
  35. 35. Demo<br />A prototype for Bergen kommune<br />
  36. 36. Ideas for the future<br />Faceted search user-interface in Ontopoly<br />could be made declarative<br />Incremental reindexing<br />requires tracking changes<br />usually done with a timestamp<br />implement last-modified field in Ontopoly<br />Add optional fourth column for score boost?<br />a float between 0 and 1<br />Ontopia extensions for interacting with Solr<br />JSP tag library<br />tolog predicates <br />
  39. 39. ...is easier to find when using faceted search.<br />Structured semantics-rich data...<br />