Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Drupalcamp Estonia - Media Sites and SOLR search


Published on

Kalle Varisvirta's presentation at DrupalCamp Estonia.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Drupalcamp Estonia - Media Sites and SOLR search

  1. 1. Searching media sites with SOLR Kalle Varisvirta, Exove
  2. 2. Media sites on Drupal <ul><li>Drupal works well with media sites </li></ul><ul><li>They’re all about UGC these days, this is what Drupal does best </li></ul><ul><li>Media has a lot of content, a powerful content management system is needed </li></ul><ul><li>Content needs to be tagged, categorized and organized in a way that it’s usable, Drupal’s taxonomy is great for this </li></ul>
  3. 3. Media sites on Drupal <ul><li>It’s not all fun and games </li></ul><ul><li>Media sites may have 100k – 10M nodes, Drupal doesn’t handle that too well </li></ul><ul><li>Media sites need elaborate caching schemes to manage the traffic, that need extra modules with Drupal </li></ul><ul><li>Drupal’s core search is not the optimal solution for searching the sites </li></ul>
  4. 4. What ’ s wrong with Drupal ’ s core search? <ul><li>MySQL based (not powerful, try faceting with it) </li></ul><ul><li>No proper language support </li></ul><ul><li>No proper search indexing tools in general (stemmer (!), stoplists, compound words, proper relevance) </li></ul>
  5. 5. What ’ s SOLR <ul><li>SOLR is a search engine based on the Lucene search library, fully free and open source </li></ul><ul><li>It ’ s written in Java (and needs a Java web platform to run) </li></ul><ul><li>It ’ s an Apache project </li></ul><ul><li>It ’ s very widely used (users include NASA, CNET, AT&T, Cisco, Disney, FCC, White House and many more) </li></ul>
  6. 6. SOLR and Drupal <ul><li>SOLR is not hard to integrate to, but since we ’ re in the Drupal community, “ there ’ s a module for that ” </li></ul><ul><li>Actually, the module is pretty great, built and supported by Aquia, called “ Apache SOLR Search Integration ” ( </li></ul><ul><li>Aquia also runs a commercial (= it costs you money) SOLR service online, that you can use for searching, if you can’t install your own SOLR for some reason </li></ul>
  7. 7. SOLR and Drupal - installation <ul><li>Install the apachesolr module </li></ul><ul><li>Install SOLR, just copy the schema.xml and stopwords.txt from the apachesolr-module to the SOLR configuration </li></ul><ul><li>Kickstart SOLR (it comes with a lightweight Java env, called Jetty) </li></ul><ul><li>Configure the apachesolr module to connect to the SOLR just installed </li></ul><ul><li>Reindex content </li></ul>
  8. 8. SOLR-powered search <ul><li>It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server) </li></ul><ul><li>It ’ s accurate, due to stemming (with language support) and better relevance calculation </li></ul><ul><li>It has spellchecker and it can suggest you better search words </li></ul><ul><li>You can finetune the relevance by content biasing </li></ul>
  9. 9. SOLR-powered search <ul><li>In addition to being faster and more accurate, SOLR-powered search gives some extra functionalities </li></ul><ul><li>Faceted search means that the user can “ drill down ” to the results, this is the modern way of doing “ advanced search ” </li></ul><ul><li>You can facet using taxonomy terms and fields, although some fields need some programming to help SOLR divide the nodes to the groups (e.g. numbers or dates) </li></ul>
  10. 10. SOLR-powered search - caveats <ul><li>SOLR is very language driven, so you need to have stopwords and stemming in the language you are using </li></ul><ul><li>Some languages are supported better than others </li></ul><ul><li>You can always contribute to the community to make the language support better </li></ul><ul><li>SOLR also needs the Java environment to run, so not available on the cheapest web hotels </li></ul><ul><li>SOLR needs some understanding of the Java world to be finetuned to best performance, that might be hard for LAMP-oriented people </li></ul>
  11. 11. Taking all the juice out of your SOLR <ul><li>We’ve been using SOLR for some other stuff too </li></ul><ul><li>It would be great to be able to use it for Views backend, but the integration module “Apache Solr Views” seems to be unsupported and pretty much dead </li></ul><ul><li>You can still use it for custom stuff, it has all your nodes anyway, but that needs some programming </li></ul><ul><li>SOLR can do a lot more, like spatial (location based) search, grouping of the results, relevance tuning by column weight etc. </li></ul>
  12. 12. Thank you for your time Questions, comments?