Apache Solr Search Mastery

4,658 views

Published on

This session is for those who are excited by the great power of Apache Solr search for Drupal and want to take things even further. Do you want take complete control over your search interface and offer more than the default features? Have you ever wondered what it takes to add data to your search index? Curious about defining facets, custom sorting, or making cool new widgets for filtering and faceting? Join us for a technical deep dive into the world of Solr search.

The general topics of this presentation will overlap with those covered at Drupalcon SF for the Drupal 6 version, but we will focus on use of the API as found in the Drupal 7 version.
Introducing the Solr index

* Learn about Solr fields, and how to map Drupal data onto them
* See how to add data to the search index
* Execute a search in PHP code and use the results

Using the API for custom search paths and interfaces

* How to use the prepare and alter hooks for the query object, and why they differ.
* Make use facing changes, or add filters that are transparent to the user.

Build custom facets based on node fields

* What comes OOTB
* Hooks to add facets for additional field types

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,658
On SlideShare
0
From Embeds
0
Number of Embeds
1,210
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Apache Solr Search Mastery

  1. 1. Apache Solr Search Mastery Peter Wolanin and Robert Douglass 25. aug 13:30 Trellon
  2. 2. We  hope  you  will  leave  having   learned  about: • What  is  Solr  and  how  do  you  run  it  locally • Ge9ng  Drupal  data  into  Solr • Changes  in  Drupal  7 • Field  API  integraAon • Searching  Solr  from  Drupal • Modifying  what’s  searched  and  the  results • Theming  search  results
  3. 3. Drupal  Interacts  with  Solr  via  HTTP • Drupal  sends  data  to  Solr  as  XML  documents • Solr  accepts  documents  POSTed  to  /update • A  different  XML  can  be  POSTed  to  delete • Searching,  etc  are  GET  requests • If  something  is  not  working  as  expected,  you   can  try  searching  directly  in  Solr  via  URL • Solr  also  includes  admin  and  analysis  interfaces   (you  need  to  lock  this  down  for  producAon).
  4. 4. Run  Solr  Using  the  Example  Dir Replace the schema.xml and solrconfig.xml with the ones from the Drupal module Invoke the start.jar: java -jar start.jar
  5. 5. Schema:  Defines  Types  &  Fields <?xml version="1.0" encoding="UTF-8" ?> <schema name="drupal-0.9.5" version="1.2"> <types> ... </types> <fields> <!-- The document id is derived from a site-spcific key (hash) and the node ID like: $document->id = $hash . '/node/' . $node->nid; --> <field name="id" type="string" indexed="true" stored="true" required="true" /> <!-- These are the fields that correspond to a Drupal node. --> <field name="site" type="string" indexed="true" stored="true"/> <field name="hash" type="string" indexed="true" stored="true"/> <field name="url" type="string" indexed="true" stored="true"/> <field name="title" type="text" indexed="true" stored="true" termVectors="true" omitNorms="true"/> <field name="sort_title" type="sortString" indexed="true" stored="false"/> <field name="body" type="text" indexed="true" stored="true" termVectors="true"/> <field name="teaser" type="text" indexed="false" stored="true"/> ... </fields> <uniqueKey>id</uniqueKey> <!-- field for the QueryParser to use when an explicit fieldname is absent --> <defaultSearchField>body</defaultSearchField> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> <solrQueryParser defaultOperator="AND"/> </schema>
  6. 6. Schema:  Defines  Types  &  Fields <field name="id" type="string" indexed="true" stored="true" required="true" /> <!-- These are the fields that correspond to a Drupal node. --> <field name="site" type="string" indexed="true" stored="true"/> <field name="hash" type="string" indexed="true" stored="true"/>
  7. 7. Dynamic  Fields  Provide  Flexibility <!-- Dynamic field definitions will be used if the name matches any of the patterns. The glob-like pattern in the name attribute must have "*" only at the start or the end. Longer patterns will be matched first. --> <dynamicField name="is_*" type="integer" indexed="true" stored="true" multiValued="false"/> <dynamicField name="im_*" type="integer" indexed="true" stored="true" multiValued="true"/> ... <dynamicField name="ss_*" type="string" indexed="true" stored="true" multiValued="false"/> <dynamicField name="ts_*" type="text" indexed="true" stored="true" multiValued="false" termVectors="true"/> <dynamicField name="ds_*" type="date" indexed="true" stored="true" multiValued="false"/> <dynamicField name="dm_*" type="date" indexed="true" stored="true" multiValued="true"/> <dynamicField name="tds_*" type="tdate"indexed="true" stored="true" multiValued="false"/> <dynamicField name="tdm_*" type="tdate"indexed="true" stored="true" multiValued="true"/> <dynamicField name="bm_*" type="boolean" indexed="true" stored="true" multiValued="true"/> <dynamicField name="bs_*" type="boolean" indexed="true" stored="true" multiValued="false"/> ... <!-- Sortable version of the dynamic string field --> <dynamicField name="sort_ss_*" type="sortString" indexed="true" stored="false"/> <copyField source="ss_*" dest="sort_ss_*"/> <!-- A random sort field --> <dynamicField name="random_*" type="rand" indexed="true" stored="true"/> <!-- This field is used to store node access records, as opposed to CCK field data --> <dynamicField name="nodeaccess*" type="integer" indexed="true" stored="false" multiValued="true"/> <dynamicField name="*" type="ignored" multiValued="true" />
  8. 8. Dynamic  Fields  Provide  Flexibility <!-- Dynamic field definitions will be used if the name matches any of the patterns. The glob-like pattern in the name attribute must have "*" only at the start or the end. Longer patterns will be matched first. --> <dynamicField name="is_*" type="integer" indexed="true" stored="true" multiValued="false"/> <dynamicField name="im_*" type="integer" indexed="true" stored="true" multiValued="true"/>
  9. 9. Dynamic  Fields  Provide  Flexibility <!-- Dynamic field definitions will be used if the name matches any of the patterns. The glob-like pattern in the name attribute must have "*" only at the start or the end. Longer patterns will be matched first. --> <dynamicField name="is_*" type="integer" indexed="true" stored="true" multiValued="false"/> <dynamicField name="im_*" type="integer" indexed="true" stored="true" multiValued="true"/> ... <dynamicField name="ss_*" type="string" indexed="true" stored="true" multiValued="false"/> <dynamicField name="ts_*" type="text" indexed="true" stored="true" multiValued="false" termVectors="true"/> <dynamicField name="ds_*" type="date" indexed="true" stored="true" multiValued="false"/> <dynamicField name="dm_*" type="date" indexed="true" stored="true" multiValued="true"/> <dynamicField name="tds_*" type="tdate"indexed="true" stored="true" multiValued="false"/> <dynamicField name="tdm_*" type="tdate"indexed="true" stored="true" multiValued="true"/> <dynamicField name="bm_*" type="boolean" indexed="true" stored="true" multiValued="true"/> <dynamicField name="bs_*" type="boolean" indexed="true" stored="true" multiValued="false"/> ... <!-- Sortable version of the dynamic string field --> <dynamicField name="sort_ss_*" type="sortString" indexed="true" stored="false"/> <copyField source="ss_*" dest="sort_ss_*"/> <!-- A random sort field --> <dynamicField name="random_*" type="rand" indexed="true" stored="true"/> <!-- This field is used to store node access records, as opposed to CCK field data --> <dynamicField name="nodeaccess*" type="integer" indexed="true" stored="false" multiValued="true"/> <dynamicField name="*" type="ignored" multiValued="true" />
  10. 10. Dynamic  Fields  Provide  Flexibility <!-- Sortable version of the dynamic string field --> <dynamicField name="sort_ss_*" type="sortString" indexed="true" stored="false"/> <copyField source="ss_*" dest="sort_ss_*"/> <!-- This field is used to store node access records, as opposed to CCK field data --> <dynamicField name="nodeaccess*" type="integer" indexed="true" stored="false" multiValued="true"/> <dynamicField name="*" type="ignored" multiValued="true" />
  11. 11. The $query object
  12. 12. Use the factory method to get an object for building your queries: $query = apachesolr_drupal_query( $keys = '', $filters = '', $solrsort = '', $base_path = '', $solr = NULL );
  13. 13. The actual class that is returned is determined by a Drupal variable: variable_get('apachesolr_query_class', array('apachesolr', 'Solr_Base_Query'));
  14. 14. interface Drupal_Solr_Query_Interface { get_filters($name); has_filter($field, $value); add_filter($field, $value, $exclude); remove_filter($field, $value); ... }
  15. 15. interface Drupal_Solr_Query_Interface { ... get_keys(); set_keys($keys); remove_keys(); ... }
  16. 16. interface Drupal_Solr_Query_Interface { ... get_path(); get_url_queryvalues(); get_query_basic(); ... }
  17. 17. interface Drupal_Solr_Query_Interface { ... get_available_sorts(); set_available_sort($field, $sort); get_solrsort(); set_solrsort($field, $direction); ... }
  18. 18. interface Drupal_Solr_Query_Interface { ... add_subquery( Drupal_Solr_Query_Interface $query); remove_subquery( Drupal_Solr_Query_Interface $query); remove_subqueries(); ... }
  19. 19. interface Drupal_Solr_Query_Interface { ... // Passes to the $solr object which // executes the search. search($keys = NULL); }
  20. 20. The $solr object
  21. 21. Use the factory method to get an object for sending requests to Solr: $solr = apachesolr_get_solr($host, $port, $path);
  22. 22. The actual class that is returned is determined by a Drupal variable: variable_get('apachesolr_service_class', array('apachesolr', 'Drupal_Apache_Solr_Service.php', 'Drupal_Apache_Solr_Service') );
  23. 23. This allows you to customize the way search works by providing a different solr service class than the standard. variable_set('apachesolr_service_class', array('acquia_search', 'Acquia_Search_Service.php', 'Acquia_Search_Service') );
  24. 24. http://code.google.com/p/solr-php-client/ class Apache_Solr_Service { addDocument( Apache_Solr_Document $document); addDocuments($documents); deleteById($id); deleteByQuery($rawQuery); ... }
  25. 25. http://code.google.com/p/solr-php-client/ class Apache_Solr_Service { ... ping(); commit(); optimize(); ... }
  26. 26. http://code.google.com/p/solr-php-client/ class Apache_Solr_Service { // Builds a GET request. search(); }
  27. 27. class Drupal_Apache_Solr_Service extends Apache_Solr_Service { getLuke(); getFields(); getStatsSummary(); ... }
  28. 28. class Drupal_Apache_Solr_Service extends Apache_Solr_Service { // Takes control of the request sending // and headers - Drupal idiomatic. _makeHttpRequest(); }
  29. 29. The $document object
  30. 30. http://code.google.com/p/solr-php-client/ class Apache_Solr_Document { addField($key, $value, $boost); setMultiValue($key, $value, $boost); }
  31. 31. Drupal  7  Changes   $query, $params $query->params $solr->search() $query->search() • Taxonomy  on  a  node  is  now  a  term  reference  field   (works  as  part  of  the  Field  API  integraAon). • Fixes  to  core  search  module  APIs  mean  that  some   hacks  are  gone:  e.g.  no,  hook_menu_alter;  we   can  set  apachesolr  as  the  default  via  search  UI.
  32. 32. You  Can  Add  Any  Data  to  the  Index hook_apachesolr_update_index(&$document, $node, $namespace) • Used  to  add  more  data  to  a  document  before   it’s  sent  to  Solr. • Can  also  be  used  to  alter  or  replace  data  added   by  apachesolr  or  another  module. • This  is  it!  (it  works  like  an  _alter  hook).
  33. 33. Image  Data  Using  Dynamic  Fields /** * Implementation of hook_apachesolr_update_index(). */ function apachesolr_image_apachesolr_update_index(&$document, $node, $namespace) { if ($node->type == 'image' && $document->entity == 'node') { $areas = array(); $sizes = image_get_derivative_sizes($node->images['_original']); foreach ($sizes as $name => $info) { $areas[$name] = $info['width'] * $info['height']; } asort($areas); $image_path = FALSE; foreach ($areas as $preset => $size) { $image_path = $node->images[$preset]; break; } if ($image_path) { $document->ss_image_relative = $image_path; // Support multi-site too. $document->ss_image_absolute = file_create_url($image_path); } } } /** * Implementation of hook_apachesolr_modify_query(). */ function apachesolr_image_apachesolr_modify_query($query, $caller) { // Also retrieve image thumbnail links. $query->params['fl'] .= ',ss_image_relative'; }
  34. 34. Image  Data  Using  Dynamic  Fields if ($image_path) { $document->ss_image_relative = $image_path; } /** * Implement hook_apachesolr_modify_query(). */ function apachesolr_image_apachesolr_modify_query( $query, $caller) { // Also retrieve image thumbnail links. $query->params['fl'] .= ',ss_image_relative'; }
  35. 35. UI  to  Exclude  Whole  Content  Types • ?q=admin/config/search/apachesolr/content-­‐bias
  36. 36. Control  Indexing  More  Precisely   hook_apachesolr_node_exclude($node, $namespace) in_array($node->type, variable_get( 'apachesolr_exclude_comments_types', array())) hook_node_update_index($node) • hook_node_update_index  output  added  to  body. • We  can  create  mulAple  documents  from  one  node   (e.g.  document  per  comment). hook_apachesolr_document_handlers($type, $namespace)
  37. 37. Field  API  IntegraAon • Most  of  the  Field  API  integraAon  follows   directly  from  the  6.x-­‐2.x  CCK  integraAon. • In  Drupal  7,  we  match  field  types,  rather  than   looking  at  the  widget. • By  default,  the  data  will  be  indexed  to  Solr  as   mulA-­‐valued,  and  named  combining  the  field   module  and  name  sm_$module_$fieldname
  38. 38. Typically  need  4  things: • What  field  types  (or  field  instances)  to  look   for  during  indexing. • The  data  type  to  use  in  the  index   (index_type) • A  funcAon  for  extracAng  the  data  from  the   field  while  indexing  (indexing_callback). • A  funcAon  for  displaying  the  data  from  the   field  during  searches  (display_callback).
  39. 39. Field  API  IntegraAon hook_apachesolr_field_mappings_alter (&$mappings) $mappings['list_text'] = array( 'display_callback' => 'apachesolr_fields_list_display_callback', 'indexing_callback' => 'apachesolr_fields_list_indexing_callback', 'index_type' => 'string', );
  40. 40. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  41. 41. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  42. 42. hook_menu: defines custom search paths /arts /arts/undergraduate /search/apachesolr_search/? filters=type%3Acatalog%20 ss_faculty%3AAR%20sm_level %3AUndergraduate
  43. 43. hook_menu: defines custom search paths /arts /arts/undergraduate /arts/undergraduate/courses
  44. 44. hook_menu: defines custom search paths // Implements hook_menu(). function mcgill_menu() { $items['arts/undergraduate/courses'] = array( 'page callback' => 'mcgill_courses_search', 'access arguments' => array('search content'), 'type' => MENU_CALLBACK, ); return $items; }
  45. 45. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  46. 46. hook_menu_alter: changes the page callback
  47. 47. hook_menu_alter: changes the page callback
  48. 48. hook_menu_alter: changes the page callback
  49. 49. hook_menu_alter: changes the page callback
  50. 50. hook_menu_alter: changes the page callback // Implements hook_menu_alter(). function mcgill_menu_alter(&$items) { if (isset($items['search/apachesolr_search/%menu_tail'])) { $items['search']['page callback'] = 'mcgill_page'; $items['search/apachesolr_search/%menu_tail']['page callback'] = 'mcgill_page'; } }
  51. 51. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  52. 52. An example Solr request
  53. 53. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  54. 54. hook_apachesolr_prepare_query($query)
  55. 55. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_process_response($response, ...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  56. 56. hook_apachesolr_prepare/modify_query($query) // Run hook_apachesolr_prepare_query($query). // Cache the built query. $current_query = apachesolr_current_query($query); // Run hook_apachesolr_modify_query($query).
  57. 57. hook_apachesolr_prepare/modify_query($query)
  58. 58. hook_apachesolr_prepare/modify_query($query)
  59. 59. hook_apachesolr_prepare_query($query): set a default Solr sort parameter
  60. 60. hook_apachesolr_prepare_query($query): set a default Solr sort parameter $query->set_available_sort('sort_ss_course_code', array( 'title' => t('Course code'), 'default' => 'asc', )); $query->remove_available_sort('created'); $query->remove_available_sort('sort_name'); $query->remove_available_sort('type');
  61. 61. hook_apachesolr_prepare_query($query): set a default Solr sort parameter if (!isset($_GET['solrsort'])) { if ($query->get_keys()) { $query->set_solrsort('score', 'asc'); } else { $query->set_solrsort('sort_ss_course_code', 'asc'); } }
  62. 62. hook_apachesolr_prepare/modify_query($query)
  63. 63. Should I use hook_apachesolr_prepare_query or hook_apachesolr_modify_query? /arts/undergraduate/courses
  64. 64. Should I use hook_apachesolr_prepare_query or hook_apachesolr_modify_query?
  65. 65. Should I use hook_apachesolr_prepare_query or hook_apachesolr_modify_query?
  66. 66. Should I use hook_apachesolr_prepare_query or hook_apachesolr_modify_query?
  67. 67. Should I use hook_apachesolr_prepare_query or hook_apachesolr_modify_query?
  68. 68. Should I use hook_apachesolr_prepare_query or hook_apachesolr_modify_query?
  69. 69. hook_apachesolr_modify_query($query): set default Solr fq parameters // Add filters for FACULTY/LEVEL/courses paths. if ($facet = get_faculty_from_path()) { $query->add_filter('ss_faculty', $facet); } if ($facet = get_level_from_path()) { $query->add_filter('sm_level', $facet); }
  70. 70. hook_apachesolr_prepare/modify_query($query)
  71. 71. hook_apachesolr_prepare/modify_query($query) Set Solr parameters in $query->params $query->params['fl'] .= ',ss_course_code'; $query->params['facet.limit'] = -1;
  72. 72. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  73. 73. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  74. 74. theme_apachesolr_search_snippets: sets the snippet
  75. 75. theme_apachesolr_search_snippets: sets the snippet // Default implementation in apachesolr_search.module. function theme_apachesolr_search_snippets($document, $snippets) { return implode(' ... ', $snippets) . ' ...'; }
  76. 76. theme_apachesolr_search_snippets: sets the snippet
  77. 77. theme_apachesolr_search_snippets: sets the snippet // Custom implementation in template.php. function mcgill_apachesolr_search_snippets($document, $snippets) { return 'anything you want!'; }
  78. 78. Analysis of an apachesolr search request search_view() $response = $query->search(...) $results = apachesolr_search_process_response ($response,$final_query) theme('search_results', $results, ...)
  79. 79. search-result.tpl.php: renders a single search result <?php print $result['node']->ss_course_code; ?> If this is user input use check_plain() - Solr can send you back the same (unsafe) user input you index. See apachesolr_clean_text() if you want to index text without tags.
  80. 80. Extra thanks to James McKinney For use of his slides and for ideas. jpmckinney on drupal.org http://evolvingweb.ca/
  81. 81. http://cph2010.drupal.org/node/8168

×