Making your Drupal fly with Apache SOLR


Published on

Published in: Technology

Making your Drupal fly with Apache SOLR

  1. 1. Making your Drupal fly with Apache SOLR<br /><ul><li>Kalle Virta, Exove</li></li></ul><li>In this presentation<br />About Exove and myself<br />The problem – and the solution (and some cowboys)<br />SOLR to do the site-wide search<br />SOLR to help with Views<br />SOLR to help with custom modules<br />And the Fine Print<br />
  2. 2.
  3. 3. We deliver business-driven web services that enable our customers to conduct better business on the Internet<br />We base our work to our customers’ strategy and needs<br />
  4. 4.
  5. 5. About me, Kalle Virta<br />Software architect and developer<br />High performance and complex integrations<br />Almost 10 years in the business<br />Seen Drupal from version 3<br />A lot of big Drupal sites / systems under by belt<br />
  6. 6. Your regular stack<br />MySQL<br />server<br />Linux + Apache<br />
  7. 7. Damn, dude<br />your<br />MySQL server<br />FIRE<br />is on<br />
  8. 8. New guys to<br /> the rescue<br />
  9. 9. Apache SOLR<br />memcached<br />Varnish<br />
  10. 10. Your enhanced stack<br />mem-cached<br />MySQL<br />server<br />Linux + Apache<br />Varnish<br />Apache<br />SOLR<br />Did you notice?<br />It’s still blue.<br />
  11. 11. The new guys<br />Varnish is a http cache and does it well – but it doesn’t help at all on your customized-for-every-person social media site<br />Memcached is a good idea, and you can even use it with cache router to cache Drupal stuff, including your own modules, but… it still just caches stuff<br />SOLR however, is a different story…<br />
  12. 12. SOLR<br />Apache SOLR is a search server around Lucene (which is a search library) written in Java<br />It needs a Java container, e.g. Jetty or Tomcat<br />In a simple way, you can save your stuff in XML form in it and then search from them<br />SOLR will tokenize and do all kinds of (configurable) magic to the data when indexing it, but it can also store the original data (not always possible with search indexers)<br />
  13. 13. SOLR for searching<br />Obviously all the features of SOLR make it optimal for sitewide searching functionality<br />You can actually find stuff with SOLR, all the fields in the search can be biased, that is, you can tune the fields in which the hits make the score go higher<br />SOLR also does one really neat thing for searching…<br />
  14. 14. ?<br />Ever heard<br />of a <br />faceted<br /> SearCH<br />
  15. 15. The old advanced search<br />Search<br />mouse<br />Product category<br />Product sub-cat<br />Manufacturer<br />Price range<br />-<br />Search<br />Too many search results (794),<br />narrow your search and try again<br />
  16. 16. The faceted search<br />Order by price<br />Logitech LS1 Laser Mouse<br />Current search<br />29 €<br />A cheap laser mouse that’ll get you<br />through even the most problematic<br />of PowerPoint presentations.<br />mouse<br />Sub-category<br />Logitech G3 Gaming Mouse<br />wireless mice (296)<br />wired mice (96)<br />laser mice (163)<br />59 €<br />A great laser mouse with more<br />buttons than you’ll ever have<br />time to configure. A steal.<br />Show all<br />Microsoft Super Mouse<br />Manufacturer<br />49 €<br />Logitech(194)<br />Microsoft (36)<br />HP (3)<br />A great mouse from the company<br />that brought you the best product<br />of all times, Windows Me.<br />Show all<br />Apple Mighty Mouse<br />129 €<br />Price range<br />The mouse the image happens<br />to be of. Never tried it. Looks<br />pretty nice, though.<br />0-50 € (384)<br />50-100 € (129)<br />100-300 € (50)<br />page 1 2 3 4 5 6 7 8 9 10<br />
  17. 17. SOLR for faceted searching<br />Apache SOLR let’s you facet search results – that is, to show possible search filters and give counts for them<br />Faceting with SOLR can also be achieved in Drupal – and now a Drupal contrib module comes to play<br />With ApacheSOLR –module ( you can do all this with a couple of clicks in your Drupal installation<br />
  18. 18. SOLRfy your Drupal search 1/3<br />Download SOLR package from<br />Unpackage it and check your server’s firewall settings to allow traffic to port 8983<br />Check that you have Java (RE) installed<br />
  19. 19. SOLRfy your Drupal search 2/3<br />Then get Drupal’s “apachesolr” module, there’s two xml files in the package, solrconfig.xml and schema.xml<br />Go back to your SOLR directory, rename example directory to “drupal” so you’ll find it easier<br />Drop the two xml files to that drupal/solr/conf –directory<br />Go to that drupal directory and fire up Apache SOLR with “java –jar start.jar”<br />
  20. 20. SOLRfy your Drupal search 3/3<br />Now you can turn on “apachesolr” module in Drupal<br />Tune the SOLR server settings in Drupal, reindex all content and then start clicking on those filtering/faceting settings on apachesolr<br />You’ll have to turn the facets on as blocks<br />But your search experience will be something else entirely<br />…and once you see how searching with SOLR works, you’re not going back<br />
  21. 21. Apachesolr -module<br />Automatically creates facets for taxonomy terms, for every vocabulary – you can just turn them on<br />Automatically creates facets for CCK fields using dropdown/radio widgets (i.e. with a set of options)<br />Exposes hooks for CCK fields (to make facets out of them)<br />Exposes hook for altering the query (to some extent)<br />Easy to use<br />
  22. 22. Faceting without SOLR<br />You can do faceting without SOLR too<br />“Faceted search” module will do it for you<br />But at only 10K nodes, SOLR is three times as fast<br />With 100K+ nodes, faceted search without SOLR is practically unusable<br />…but for small sites, SOLR is not necessary for faceting<br />
  23. 23. SEARCH<br />So you can<br />with SOLR …but my site does<br />A LOT more<br />
  24. 24. SOLRify the rest of your Drupal universe<br />You probably know your performance problems on your site<br />If it’s somehow personalized, you usually can’t do anything about it with caching<br />How about using SOLR for it?<br />Apache Solr Views –module (at a very mature “dev” state ;) and Views 3 (dev too) will talk together and integrate to apachesolr –module and it’s SOLR index<br />When this is stable and fully functional…<br />
  25. 25. It’ll make your Views<br />FLY<br />
  26. 26. SELECT <br /> title, <br /> description, <br /> mediatype<br />FROM media<br />LEFT JOIN <br /> media_types<br />ON<br /> media_type_id = type_id<br />LEFT JOIN<br /> media_tag<br />ON<br /> media_tag.mid =<br />WHERE<br /> name LIKE ‘%s’<br />OR<br /> description LIKE ‘%s’<br />OR <br />IN (SELECT mid FROM promoted_media)<br />But my problems are in my <br />custom modules<br />
  27. 27. Custom modules<br />Custom modules can be designed with ApacheSOLR in mind<br />When you realize all the potential there is in a indexer that can index XML files, sky is the limit<br />Whenever you have a data structure that’s too complex for MySQL to search from – and that’s not too rarely – you might benefit from indexing that data to SOLR and using your SOLR as the read-only “db”<br />
  28. 28. Custom modules – making SOLR do the reading<br />media_workflow<br />media_tag<br />A single “row” for SOLR to index<br />media<br />media_revision<br />tag<br />media_version<br />files<br />
  29. 29. Custom modules – making SOLR do the reading<br />You know you need a better structure when you can’t circumvent running LEFT JOIN or subqueries – and running them gets too slow<br />When you’ve optimized your code several times and restructuring your database would mean creating a read-optimized cache of everything<br />Then SOLR might be just the thing to get you through<br />
  30. 30. Custom modules – making SOLR do the reading<br />MySQL<br />server<br />Write<br />Index<br />Apache<br />SOLR<br />Read<br />
  31. 31. Libraries to use with custom modules<br />Apachesolr –module uses a SOLR library written in PHP and licensed in New BSD (<br />There’s also a PECL extension, but I’m not aware of any speed comparisons<br />There are also contrib Drupal modules that give you an API for accessing SOLR<br />
  32. 32. magic<br />It’s no<br />bullet<br />
  33. 33. Not a magic bullet 1/2<br />Apache SOLR is a hassle with all the java containers and such, you’ll probably have to run it on a separate server<br />You should always run stuff through Drupal or a script that will authenticate and authorize calls to SOLR (SOLR shouldn’t be exposed, unless all the data is public)<br />Sometimes the extra server might be better to use on an extra MySQL node<br />Sometimes you can just fix your stuff and make it as fast as it would be on Apache SOLR<br />
  34. 34. Not a magic bullet 2/2<br />And then there’s the fact SOLR is build mainly for the English language<br />So make sure SOLR will do what you want for you in the language you want it to do it in<br />
  35. 35. Recap<br />SOLR will right now give your Drupal site a fast, faceted search with really easy setup (thanks to apachesolr module)<br />SOLR will soon give a boost to the performance and search abilities of your views<br />SOLR will right now give you a lot of more power for searching from your custom databases and complicated content types, if used by a module developer<br />It’s still not a magic bullet – it has it’s downsides<br />
  36. 36. Sounds<br /> easy?<br />Been there, <br />done that?<br />is recruiting<br />Send your CV to<br />
  37. 37. Thank you for your time<br />Questions?<br />If you’d rather ask me in private,<br />drop a mail to<br />