Making your Drupal fly with Apache SOLRKalle Virta, ExoveIn this presentationAbout Exove and myselfThe problem – and the solution (and some cowboys)SOLR to do the site-wide searchSOLR to help with ViewsSOLR to help with custom modulesAnd the Fine Print
We deliver business-driven web services that enable our customers to conduct better business on the InternetWe base our work to our customers’ strategy and needs
About me, Kalle VirtaSoftware architect and developerHigh performance and complex integrationsAlmost 10 years in the businessSeen Drupal from version 3A lot of big Drupal sites / systems under by belt
Your regular stackMySQLserverLinux + Apache
Damn, dudeyourMySQL serverFIREis on
New guys to the rescue
Apache SOLRmemcachedVarnish
Your enhanced stackmem-cachedMySQLserverLinux + ApacheVarnishApacheSOLRDid you notice?It’s still blue.
The new guysVarnish is a http cache and does it well – but it doesn’t help at all on your customized-for-every-person social media siteMemcached is a good idea, and you can even use it with cache router to cache Drupal stuff, including your own modules, but… it still just caches stuffSOLR however, is a different story…
SOLRApache SOLR is a search server around Lucene (which is a search library) written in JavaIt needs a Java container, e.g. Jetty or TomcatIn a simple way, you can save your stuff in XML form in it and then search from themSOLR will tokenize and do all kinds of (configurable) magic to the data when indexing it, but it can also store the original data (not always possible with search indexers)
SOLR for searchingObviously all the features of SOLR make it optimal for sitewide searching functionalityYou can actually find stuff with SOLR, all the fields in the search can be biased, that is, you can tune the fields in which the hits make the score go higherSOLR also does one really neat thing for searching…
?Ever heardof a faceted     SearCH
The old advanced searchSearchmouseProduct categoryProduct sub-catManufacturerPrice range-SearchToo many search results (794),narrow your search and try again
The faceted searchOrder by priceLogitech LS1 Laser MouseCurrent search29 €A cheap laser mouse that’ll get youthrough even the most problematicof PowerPoint presentations.mouseSub-categoryLogitech G3 Gaming Mousewireless mice (296)wired mice (96)laser mice (163)59 €A great laser mouse with morebuttons than you’ll ever havetime to configure. A steal.Show allMicrosoft Super MouseManufacturer49 €Logitech(194)Microsoft (36)HP (3)A great mouse from the companythat brought you the best productof all times, Windows Me.Show allApple Mighty Mouse129 €Price rangeThe mouse the image happensto be of. Never tried it. Lookspretty nice, though.0-50 € (384)50-100 € (129)100-300 € (50)page 1 2 3 4 5 6 7 8 9 10
SOLR for faceted searchingApache SOLR let’s you facet search results – that is, to show possible search filters and give counts for themFaceting with SOLR can also be achieved in Drupal – and now a Drupal contrib module comes to playWith ApacheSOLR –module (http://drupal.org/project/apachesolr) you can do all this with a couple of clicks in your Drupal installation
SOLRfy your Drupal search 1/3Download SOLR package from http://www.apache.org/dyn/closer.cgi/lucene/solr/Unpackage it and check your server’s firewall settings to allow traffic to port 8983Check that you have Java (RE) installed
SOLRfy your Drupal search 2/3Then get Drupal’s “apachesolr” module, there’s two xml files in the package, solrconfig.xml and schema.xmlGo back to your SOLR directory, rename example directory to “drupal” so you’ll find it easierDrop the two xml files to that drupal/solr/conf –directoryGo to that drupal directory and fire up Apache SOLR with “java –jar start.jar”
SOLRfy your Drupal search 3/3Now you can turn on “apachesolr” module in DrupalTune the SOLR server settings in Drupal, reindex all content and then start clicking on those filtering/faceting settings on apachesolrYou’ll have to turn the facets on as blocksBut your search experience will be something else entirely…and once you see how searching with SOLR works, you’re not going back
Apachesolr -moduleAutomatically creates facets for taxonomy terms, for every vocabulary – you can just turn them onAutomatically creates facets for CCK fields using dropdown/radio widgets (i.e. with a set of options)Exposes hooks for CCK fields (to make facets out of them)Exposes hook for altering the query (to some extent)Easy to use
Faceting without SOLRYou can do faceting without SOLR too“Faceted search” module will do it for youBut at only 10K nodes, SOLR is three times as fastWith 100K+ nodes, faceted search without SOLR is practically unusable…but for small sites, SOLR is not necessary for faceting
SEARCHSo you canwith SOLR …but my site doesA LOT more
SOLRify the rest of your Drupal universeYou probably know your performance problems on your siteIf it’s somehow personalized, you usually can’t do anything about it with cachingHow about using SOLR for it?Apache Solr Views –module (at a very mature “dev” state ;) and Views 3 (dev too) will talk together and integrate to apachesolr –module and it’s SOLR indexWhen this is stable and fully functional…
It’ll make your ViewsFLY
SELECT 	title, 	description, 	mediatypeFROM mediaLEFT JOIN 	media_typesON	media_type_id = type_idLEFT JOIN	media_tagON	media_tag.mid = media.idWHERE	name LIKE ‘%s’OR	description LIKE ‘%s’OR media.id IN (SELECT mid FROM promoted_media)But my problems are in my custom modules
Custom modulesCustom modules can be designed with ApacheSOLR in mindWhen you realize all the potential there is in a indexer that can index XML files, sky is the limitWhenever you have a data structure that’s too complex for MySQL to search from – and that’s not too rarely – you might benefit from indexing that data to SOLR and using your SOLR as the read-only “db”
Custom modules – making SOLR do the readingmedia_workflowmedia_tagA single “row” for SOLR to indexmediamedia_revisiontagmedia_versionfiles
Custom modules – making SOLR do the readingYou know you need a better structure when you can’t circumvent running LEFT JOIN or subqueries – and running them gets too slowWhen you’ve optimized your code several times and restructuring your database would mean creating a read-optimized cache of everythingThen SOLR might be just the thing to get you through
Custom modules – making SOLR do the readingMySQLserverWriteIndexApacheSOLRRead
Libraries to use with custom modulesApachesolr –module uses a SOLR library written in PHP and licensed in New BSD (http://code.google.com/p/solr-php-client/)There’s also a PECL extension, but I’m not aware of any speed comparisonsThere are also contrib Drupal modules that give you an API for accessing SOLR
magicIt’s nobullet
Not a magic bullet 1/2Apache SOLR is a hassle with all the java containers and such, you’ll probably have to run it on a separate serverYou should always run stuff through Drupal or a script that will authenticate and authorize calls to SOLR (SOLR shouldn’t be exposed, unless all the data is public)Sometimes the extra server might be better to use on an extra MySQL nodeSometimes you can just fix your stuff and make it as fast as it would be on Apache SOLR
Not a magic bullet 2/2And then there’s the fact SOLR is build mainly for the English languageSo make sure SOLR will do what you want for you in the language you want it to do it in
RecapSOLR will right now give your Drupal site a fast, faceted search with really easy setup (thanks to apachesolr module)SOLR will soon give a boost to the performance and search abilities of your viewsSOLR will right now give you a lot of more power for searching from your custom databases and complicated content types, if used by a module developerIt’s still not a magic bullet – it has it’s downsides
Sounds easy?Been there, done that?is recruitingSend your CV to jobs@exove.com
Thank you for your timeQuestions?If you’d rather ask me in private,drop a mail to kalle@exove.com
Making your Drupal fly with Apache SOLR

Making your Drupal fly with Apache SOLR

  • 1.
    Making your Drupalfly with Apache SOLRKalle Virta, ExoveIn this presentationAbout Exove and myselfThe problem – and the solution (and some cowboys)SOLR to do the site-wide searchSOLR to help with ViewsSOLR to help with custom modulesAnd the Fine Print
  • 3.
    We deliver business-drivenweb services that enable our customers to conduct better business on the InternetWe base our work to our customers’ strategy and needs
  • 5.
    About me, KalleVirtaSoftware architect and developerHigh performance and complex integrationsAlmost 10 years in the businessSeen Drupal from version 3A lot of big Drupal sites / systems under by belt
  • 6.
  • 7.
  • 8.
    New guys tothe rescue
  • 9.
  • 10.
    Your enhanced stackmem-cachedMySQLserverLinux+ ApacheVarnishApacheSOLRDid you notice?It’s still blue.
  • 11.
    The new guysVarnishis a http cache and does it well – but it doesn’t help at all on your customized-for-every-person social media siteMemcached is a good idea, and you can even use it with cache router to cache Drupal stuff, including your own modules, but… it still just caches stuffSOLR however, is a different story…
  • 12.
    SOLRApache SOLR isa search server around Lucene (which is a search library) written in JavaIt needs a Java container, e.g. Jetty or TomcatIn a simple way, you can save your stuff in XML form in it and then search from themSOLR will tokenize and do all kinds of (configurable) magic to the data when indexing it, but it can also store the original data (not always possible with search indexers)
  • 13.
    SOLR for searchingObviouslyall the features of SOLR make it optimal for sitewide searching functionalityYou can actually find stuff with SOLR, all the fields in the search can be biased, that is, you can tune the fields in which the hits make the score go higherSOLR also does one really neat thing for searching…
  • 14.
    ?Ever heardof afaceted SearCH
  • 15.
    The old advancedsearchSearchmouseProduct categoryProduct sub-catManufacturerPrice range-SearchToo many search results (794),narrow your search and try again
  • 16.
    The faceted searchOrderby priceLogitech LS1 Laser MouseCurrent search29 €A cheap laser mouse that’ll get youthrough even the most problematicof PowerPoint presentations.mouseSub-categoryLogitech G3 Gaming Mousewireless mice (296)wired mice (96)laser mice (163)59 €A great laser mouse with morebuttons than you’ll ever havetime to configure. A steal.Show allMicrosoft Super MouseManufacturer49 €Logitech(194)Microsoft (36)HP (3)A great mouse from the companythat brought you the best productof all times, Windows Me.Show allApple Mighty Mouse129 €Price rangeThe mouse the image happensto be of. Never tried it. Lookspretty nice, though.0-50 € (384)50-100 € (129)100-300 € (50)page 1 2 3 4 5 6 7 8 9 10
  • 17.
    SOLR for facetedsearchingApache SOLR let’s you facet search results – that is, to show possible search filters and give counts for themFaceting with SOLR can also be achieved in Drupal – and now a Drupal contrib module comes to playWith ApacheSOLR –module (http://drupal.org/project/apachesolr) you can do all this with a couple of clicks in your Drupal installation
  • 18.
    SOLRfy your Drupalsearch 1/3Download SOLR package from http://www.apache.org/dyn/closer.cgi/lucene/solr/Unpackage it and check your server’s firewall settings to allow traffic to port 8983Check that you have Java (RE) installed
  • 19.
    SOLRfy your Drupalsearch 2/3Then get Drupal’s “apachesolr” module, there’s two xml files in the package, solrconfig.xml and schema.xmlGo back to your SOLR directory, rename example directory to “drupal” so you’ll find it easierDrop the two xml files to that drupal/solr/conf –directoryGo to that drupal directory and fire up Apache SOLR with “java –jar start.jar”
  • 20.
    SOLRfy your Drupalsearch 3/3Now you can turn on “apachesolr” module in DrupalTune the SOLR server settings in Drupal, reindex all content and then start clicking on those filtering/faceting settings on apachesolrYou’ll have to turn the facets on as blocksBut your search experience will be something else entirely…and once you see how searching with SOLR works, you’re not going back
  • 21.
    Apachesolr -moduleAutomatically createsfacets for taxonomy terms, for every vocabulary – you can just turn them onAutomatically creates facets for CCK fields using dropdown/radio widgets (i.e. with a set of options)Exposes hooks for CCK fields (to make facets out of them)Exposes hook for altering the query (to some extent)Easy to use
  • 22.
    Faceting without SOLRYoucan do faceting without SOLR too“Faceted search” module will do it for youBut at only 10K nodes, SOLR is three times as fastWith 100K+ nodes, faceted search without SOLR is practically unusable…but for small sites, SOLR is not necessary for faceting
  • 23.
    SEARCHSo you canwithSOLR …but my site doesA LOT more
  • 24.
    SOLRify the restof your Drupal universeYou probably know your performance problems on your siteIf it’s somehow personalized, you usually can’t do anything about it with cachingHow about using SOLR for it?Apache Solr Views –module (at a very mature “dev” state ;) and Views 3 (dev too) will talk together and integrate to apachesolr –module and it’s SOLR indexWhen this is stable and fully functional…
  • 25.
  • 26.
    SELECT title, description, mediatypeFROM mediaLEFT JOIN media_typesON media_type_id = type_idLEFT JOIN media_tagON media_tag.mid = media.idWHERE name LIKE ‘%s’OR description LIKE ‘%s’OR media.id IN (SELECT mid FROM promoted_media)But my problems are in my custom modules
  • 27.
    Custom modulesCustom modulescan be designed with ApacheSOLR in mindWhen you realize all the potential there is in a indexer that can index XML files, sky is the limitWhenever you have a data structure that’s too complex for MySQL to search from – and that’s not too rarely – you might benefit from indexing that data to SOLR and using your SOLR as the read-only “db”
  • 28.
    Custom modules –making SOLR do the readingmedia_workflowmedia_tagA single “row” for SOLR to indexmediamedia_revisiontagmedia_versionfiles
  • 29.
    Custom modules –making SOLR do the readingYou know you need a better structure when you can’t circumvent running LEFT JOIN or subqueries – and running them gets too slowWhen you’ve optimized your code several times and restructuring your database would mean creating a read-optimized cache of everythingThen SOLR might be just the thing to get you through
  • 30.
    Custom modules –making SOLR do the readingMySQLserverWriteIndexApacheSOLRRead
  • 31.
    Libraries to usewith custom modulesApachesolr –module uses a SOLR library written in PHP and licensed in New BSD (http://code.google.com/p/solr-php-client/)There’s also a PECL extension, but I’m not aware of any speed comparisonsThere are also contrib Drupal modules that give you an API for accessing SOLR
  • 32.
  • 33.
    Not a magicbullet 1/2Apache SOLR is a hassle with all the java containers and such, you’ll probably have to run it on a separate serverYou should always run stuff through Drupal or a script that will authenticate and authorize calls to SOLR (SOLR shouldn’t be exposed, unless all the data is public)Sometimes the extra server might be better to use on an extra MySQL nodeSometimes you can just fix your stuff and make it as fast as it would be on Apache SOLR
  • 34.
    Not a magicbullet 2/2And then there’s the fact SOLR is build mainly for the English languageSo make sure SOLR will do what you want for you in the language you want it to do it in
  • 35.
    RecapSOLR will rightnow give your Drupal site a fast, faceted search with really easy setup (thanks to apachesolr module)SOLR will soon give a boost to the performance and search abilities of your viewsSOLR will right now give you a lot of more power for searching from your custom databases and complicated content types, if used by a module developerIt’s still not a magic bullet – it has it’s downsides
  • 36.
    Sounds easy?Been there,done that?is recruitingSend your CV to jobs@exove.com
  • 37.
    Thank you foryour timeQuestions?If you’d rather ask me in private,drop a mail to kalle@exove.com