• verypopular, extremely fast Java-based open source enterprise search platform from the Apache Lucene project• runsas a standalone full-text search server within a servlet container such as Tomcat• not an acronym - doesn’t stand for anything• powers the search and navigation features on many of the world’s largest sites
• initially developed by CNET Networks as in-house search platform in 2004 called “Solar”• CNET granted existing codebase to Apache Software Foundation in 2006 - name changed to “Solr”• in January 2007 Solr became a Lucene subproject• in March 2010, Solr and Lucene-java merged
The Apache Lucene project develops open source search software, including:• Apache Lucene Core (formerly Lucene Java) - provides Java- based indexing and search, plus spellchecking, hit highlighting, and advanced analysis/tokenization capabilities• Apache Solr• Apache PyLucene - a Python port of Lucene Core• Apache Open Relevance Project - collects and distributes free materials for relevance testing & performance
• default Drupal search is decent for smaller sites• doesn’t deal well with large amounts of content (say 10k+ nodes) - doesn’t scale; gets bogged down• limited operators• integrated - it runs and searches directly on the same database• SQL was not designed as a searching language• “Relational Database Management Systems (RDBMS) are physically incapable of handling search well.”
• thereare several modules that enhance core search by providing stuff like faceted search and improved stemming• butthere’s no getting around its performance limitations and lack of scalability
1. Index and make searchable a really large amount of content - from 10k+ nodes up into the millions2. Provide faceted search-based navigation so users can ﬁnd content faster & more intuitively, drilling down into content by date, author, tags, content type, & other attributes3. Provide search autocomplete, spelling suggestions, and content recommendations
4. Provide a faster search experience than the default Drupal search is able to5. Give site visitors access to simple, easy to use advanced search features without confronting them with the “advanced search” page6. Provide users with the ability to do location-based search - to ﬁlter results by geographic location7. Expose all attributes of nodes to search
8. Place search functions on a completely separate server Web server + PHP GET to SQL search POST to index database Solr server Diagram adapted from Robert Douglass’ 2008 slide set - see Resources
• facetedsearch is dynamic clustering of items or search results into categories that let users drill down into search results (or even skip searching entirely) by any value in that ﬁeld• eachfacet also shows the number of hits within the search that match that category• faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search
You’ll need:• Java 5 or higher• PHP 5.2 for Drupal 6, but PHP 5.1.4 will work if you have PECL JSON extension or Zend Framework JSON classes
1. Go to the Apache Solr Search Integration project page http://drupal.org/project/apachesolr2. Install the module3. Grab the Solr PHP library via svn OR get the bundled Acquia Search download4. Enable the module5. Download Solr 1.4 and unpack outside of Drupal directory
6. Rename the existing ﬁles apache-solr-nightly/example/solr/ conf/schema.xml and solrconﬁg.xml to *.bak to get them out of the way7. Copy schema.xml and solrconﬁg.xml that come with Apache Solr Drupal module to take their place8. Start Solr by opening a shell (Putty, Mac Terminal), going to the apache-solr-nightly/example folder, and executing command java -jar start.jar
9. Test that Solr server is available at http://localhost:8983/solr/ admin10. Make sure both the main Apache Solr Framework and Apache Solr Search modules are enabled - if the Solr Search module isn’t enabled, no indexing will occur11. Run cron until your content is indexed12. Enable blocks for facets
• Apache Solr module depends on Drupal’s core Search module• when Solr is enabled, the Search module will also be enabled• as soon as the core Search module is enabled it starts to index all your nodes• this takes time to run and ﬁlls up the database (search_dataset, search_index... tables)
• if you’re installing Solr Search, you don’t need Drupal’s core search form• you replace it with the Solr one by going to the Solr module settings and clicking “Make Apache Solr Search the default”• this disables the core Search module’s form - but not the indexing
• to disable the indexing - and save some CPU cycles and database space - go to your site’s search settings at admin/ settings/search and set the “number of items to index per cron run” to 0 Thanks to DrupalCoder.com for this tip - http://www.drupalcoder.com/blog/performance-tip-disable-drupals-core-search-indexer-when-using-apache-solr
• Solr Search indexing is triggered by cron runs• default Drupal cron job triggers all cron tasks at the same time• this can be a serious drag on performance and can cause cron runs to fail if one or more tasks doesn’t ﬁnish in the allotted cron period• to get around this, use...
• Elysia Cron - http://drupal.org/project/elysia_cron• expands cron capabilities - gives you crontab-like scheduling so you can run different tasks at different times and frequencies• so for example - set Solr Search to index 1000 nodes every 15 minutes, while other cron tasks are set to run once every hour
• to get fastest indexing on your server, experiment with different numbers of items to index per cron run and different cron run times until you ﬁnd the max your server is capable of handling• ex: try indexing 1000 items per cron run and set the cron to run every 5 minutes• if you don’t get any errors, you’re good
• Solr Search integrates with Drush• you can call Solr tasks from the Drush command line• commands include...
• solr-delete-index Deletes the contents of the index. Can take content types as parameters• solr-index Send to Solr content marked for (re)indexing. Same as running cron once but without the other overhead• solr-reindex Marks content for reindexing. Can take content types as parameters• solr-search Search the site for keywords using Apache Solr
• Acquia has a hosted SaaS version of Solr that they call Acquia Search• it’s plug and play and available for Drupal 6 and 7• gives you all the power of Solr without having to install any software (beyond the Solr Drupal modules) or manage any servers• really easy to set up, really fast and robust, kind of pricey• http://acquia.com/products-services/acquia-search
• you can get a 30 day free trial of Acquia Search at http:// acquia.com/trial• easiest way to test drive Solr
• this is where it starts to get even more interesting• Views 3 (still in alpha for Drupal 6 but in beta for Drupal 7) allows you to make custom searches against the Solr index the same way you currently make views against the MySQL database• ex: build a Solr search that just includes videos and MP3s and render the results as a playlist• ex: a Solr search that’s limited to the current user’s images, displayed as a slideshow
• upshot: you can bypass the Drupal database and build your content straight off the Solr index• no database queries• no complex views queries with tons of joins• no node_load() calls for displaying the results
• best place to start learning is on the Solr Search docs page on drupal.org at - http://drupal.org/node/343467• Robert Douglass did a great Solr presentation in 2008 - slides are online at http://www.slideshare.net/robertDouglass/ apachesolr-presentation-from-do-it-with-drupal-presentation• the book “Solr 1.4 Enterprise Search Server” is apparently good - review here: http://www.drupalcoder.com/blog/book-review-from-a-drupal- point-of-view-solr-14-enterprise-search-server
• great article by Robert Douglass - “Views 3 + Apache Solr + Acquia Drupal = The Future of Search” http://acquia.com/blog/views-3-apache-solr-acquia-drupal- future-search• article - “Three things we learned from indexing a Drupal site with millions of nodes in Apache Solr” - http://www.drupalcoder.com/blog/three-things-we-learned- from-indexing-a-drupal-site-with-millions-of-nodes-in-apache- solr• article - “Geospatial Apache Solr searching in Drupal 6 by upgrading Solr to 3.1” - http://thedrupalblog.com/geospatial-apache-solr-searching- drupal-6-upgrading-solr-31
• how to install Solr on Mac OS X Snow Leopard - http://www.drupalcoder.com/blog/installing-apache-solr-in- tomcat-for-drupal-on-snow-leopard• setting up Drupal 6 with Apache Solr on Tomcat 6 and Ubuntu 9.10 - http://www.nickveenhof.be/blog/setup-drupal-6-apache-solr- tomcat-6-and-ubuntu-910-karmic-koala• Conﬁguring Apache Solr Multi-core with Drupal and Tomcat on Ubuntu 9.10 - http://drupalconnect.com/blog/steve/conﬁguring-apache-solr- multi-core-drupal-and-tomcat-ubuntu-910
• Jetty powered multicore Apache Solr and Drupal in Ubuntu 10.04 - http://vladgh.com/blog/jetty-powered-multicore-apache-solr- and-drupal-ubuntu-1004• Solr tutorials on the ofﬁcial Apache Solr site - http://lucene.apache.org/solr/tutorial.html• the ofﬁcial Apache Solr wiki - http://wiki.apache.org/solr/FrontPage• DrupalCamp Montreal 2009 video presentation on Solr - http://yadadrop.com/drupal-video/drupal-apache-solr-setup- conﬁguration-extensions-hooks
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.