Drupalcamp Estonia - Media Sites and SOLR search


Kalle Varisvirta's presentation at DrupalCamp Estonia.

Published in: Technology
  1. 1. Searching media sites with SOLR Kalle Varisvirta, Exove
  2. 2. Media sites on Drupal <ul><li>Drupal works well with media sites </li></ul><ul><li>They’re all about UGC these days, this is what Drupal does best </li></ul><ul><li>Media has a lot of content, a powerful content management system is needed </li></ul><ul><li>Content needs to be tagged, categorized and organized in a way that it’s usable, Drupal’s taxonomy is great for this </li></ul>
  3. 3. Media sites on Drupal <ul><li>It’s not all fun and games </li></ul><ul><li>Media sites may have 100k – 10M nodes, Drupal doesn’t handle that too well </li></ul><ul><li>Media sites need elaborate caching schemes to manage the traffic, that need extra modules with Drupal </li></ul><ul><li>Drupal’s core search is not the optimal solution for searching the sites </li></ul>
  4. 4. What ’ s wrong with Drupal ’ s core search? <ul><li>MySQL based (not powerful, try faceting with it) </li></ul><ul><li>No proper language support </li></ul><ul><li>No proper search indexing tools in general (stemmer (!), stoplists, compound words, proper relevance) </li></ul>
  5. 5. What ’ s SOLR <ul><li>SOLR is a search engine based on the Lucene search library, fully free and open source </li></ul><ul><li>It ’ s written in Java (and needs a Java web platform to run) </li></ul><ul><li>It ’ s an Apache project </li></ul><ul><li>It ’ s very widely used (users include NASA, CNET, AT&T, Cisco, Disney, FCC, White House and many more) </li></ul>
  6. 6. SOLR and Drupal <ul><li>SOLR is not hard to integrate to, but since we ’ re in the Drupal community, “ there ’ s a module for that ” </li></ul><ul><li>Actually, the module is pretty great, built and supported by Aquia, called “ Apache SOLR Search Integration ” ( </li></ul><ul><li>Aquia also runs a commercial (= it costs you money) SOLR service online, that you can use for searching, if you can’t install your own SOLR for some reason </li></ul>
  7. 7. SOLR and Drupal - installation <ul><li>Install the apachesolr module </li></ul><ul><li>Install SOLR, just copy the schema.xml and stopwords.txt from the apachesolr-module to the SOLR configuration </li></ul><ul><li>Kickstart SOLR (it comes with a lightweight Java env, called Jetty) </li></ul><ul><li>Configure the apachesolr module to connect to the SOLR just installed </li></ul><ul><li>Reindex content </li></ul>
  8. 8. SOLR-powered search <ul><li>It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server) </li></ul><ul><li>It ’ s accurate, due to stemming (with language support) and better relevance calculation </li></ul><ul><li>It has spellchecker and it can suggest you better search words </li></ul><ul><li>You can finetune the relevance by content biasing </li></ul>
  9. 9. SOLR-powered search <ul><li>In addition to being faster and more accurate, SOLR-powered search gives some extra functionalities </li></ul><ul><li>Faceted search means that the user can “ drill down ” to the results, this is the modern way of doing “ advanced search ” </li></ul><ul><li>You can facet using taxonomy terms and fields, although some fields need some programming to help SOLR divide the nodes to the groups (e.g. numbers or dates) </li></ul>
  10. 10. SOLR-powered search - caveats <ul><li>SOLR is very language driven, so you need to have stopwords and stemming in the language you are using </li></ul><ul><li>Some languages are supported better than others </li></ul><ul><li>You can always contribute to the community to make the language support better </li></ul><ul><li>SOLR also needs the Java environment to run, so not available on the cheapest web hotels </li></ul><ul><li>SOLR needs some understanding of the Java world to be finetuned to best performance, that might be hard for LAMP-oriented people </li></ul>
  11. 11. Taking all the juice out of your SOLR <ul><li>We’ve been using SOLR for some other stuff too </li></ul><ul><li>It would be great to be able to use it for Views backend, but the integration module “Apache Solr Views” seems to be unsupported and pretty much dead </li></ul><ul><li>You can still use it for custom stuff, it has all your nodes anyway, but that needs some programming </li></ul><ul><li>SOLR can do a lot more, like spatial (location based) search, grouping of the results, relevance tuning by column weight etc. </li></ul>
  12. 12. Thank you for your time Questions, comments?