+INTRO TO APACHESOLR FOR DRUPAL
Presentation by Chris Capledrupal.org username: reallyordinaryhttp://drupal.org/user/791914Presented at May 30, 2011 Toron...
WHAT IS APACHE SOLR?
• verypopular, extremely fast Java-based open source enterprise search platform from the Apache Lucene project• runsas a s...
SITES LIKE...
• the White   House   • Zappos• AOL                 • SourceForge• eHarmony            • Buy.com• Ticketmaster        • th...
And of course... drupal.org
• so     the point is - it’s great for large, high traffic sites• it’s   heavy duty, internet-scale stuff• butit’ll also se...
A BIT OF HISTORY
• initially         developed by CNET Networks as in-house search  platform in 2004 called “Solar”• CNET  granted existing...
WHAT IS APACHE LUCENE?
The Apache Lucene project develops open source search    software, including:• Apache    Lucene Core (formerly Lucene Java...
LIMITATIONS OF DEFAULT     DRUPAL SEARCH
• default   Drupal search is decent for smaller sites• doesn’t        deal well with large amounts of content (say 10k+ no...
• thereare several modules that enhance core search by providing stuff like faceted search and improved stemming• butthere...
BENEFITS OF USING SOLR
1. Index and make searchable a really large amount of content -  from 10k+ nodes up into the millions2. Provide faceted se...
4. Provide a faster search experience than the default Drupal  search is able to5. Give site visitors access to simple, ea...
8. Place search functions on a completely separate server                              Web server +                       ...
KEY SOLR FEATURES
• powerful     full-text search   • content    recommendations• hit   highlighting              • rich                    ...
WHAT’S FACETED SEARCH?
• facetedsearch is dynamic clustering of items or search results into categories that let users drill down into search res...
FACETED SEARCH EXAMPLEdiagram source: Lucid Imagination - http://www.lucidimagination.com/Community/Hear-from-the-Experts/...
QUICK SOLR DEMOS ON LIVE       DRUPAL SITES
Whitehouse.govDrupal.orgGoChicOrGoHome.comNew York Public Library
HOW DO YOU SET IT UP?
You’ll need:•   Java 5 or higher•   PHP 5.2 for Drupal 6, but PHP 5.1.4 will work if you have    PECL JSON extension or Ze...
1. Go to the Apache Solr Search Integration project page  http://drupal.org/project/apachesolr2. Install the module3. Grab...
6. Rename the existing files apache-solr-nightly/example/solr/  conf/schema.xml and solrconfig.xml to *.bak to get them out ...
9. Test that Solr server is available at http://localhost:8983/solr/  admin10. Make sure both the main Apache Solr Framewo...
PRO TIP 1:DISABLING CORE SEARCH        INDEXER
•   Apache Solr module depends on Drupal’s core Search    module•   when Solr is enabled, the Search module will also be e...
•   if you’re installing Solr Search, you don’t need Drupal’s core    search form•   you replace it with the Solr one by g...
•   to disable the indexing - and save some CPU cycles and    database space - go to your site’s search settings at admin/...
PRO TIP 2:CRON VS. ELYSIA CRON
•   Solr Search indexing is triggered by cron runs•   default Drupal cron job triggers all cron tasks at the same time•   ...
•   Elysia Cron - http://drupal.org/project/elysia_cron•   expands cron capabilities - gives you crontab-like scheduling s...
•   to get fastest indexing on your server, experiment with    different numbers of items to index per cron run and differ...
DRUSH
•   Solr Search integrates with Drush•   you can call Solr tasks from the Drush command line•   commands include...
•   solr-delete-index    Deletes the contents of the index. Can take content types as    parameters•   solr-index    Send ...
ACQUIA SEARCH
•   Acquia has a hosted SaaS version of Solr that they call Acquia    Search•   it’s plug and play and available for Drupa...
•   you can get a 30 day free trial of Acquia Search at http://    acquia.com/trial•   easiest way to test drive Solr
SOLR + VIEWS 3 =THE (VERY NEAR) FUTURE
•   this is where it starts to get even more interesting•   Views 3 (still in alpha for Drupal 6 but in beta for Drupal 7)...
•   upshot: you can bypass the Drupal database and build your    content straight off the Solr index•   no database querie...
RESOURCES
•   best place to start learning is on the Solr Search docs page on    drupal.org at -    http://drupal.org/node/343467•  ...
•   great article by Robert Douglass - “Views 3 + Apache Solr +    Acquia Drupal = The Future of Search”    http://acquia....
•   how to install Solr on Mac OS X Snow Leopard -    http://www.drupalcoder.com/blog/installing-apache-solr-in-    tomcat...
•   Jetty powered multicore Apache Solr and Drupal in Ubuntu    10.04 -    http://vladgh.com/blog/jetty-powered-multicore-...
Upcoming SlideShare
Loading in...5
×

Intro to Apache Solr for Drupal

22,635

Published on

A presentation I gave at the May 30, 2011 Toronto Drupal usergroup meetup.

Published in: Technology
2 Comments
35 Likes
Statistics
Notes
No Downloads
Views
Total Views
22,635
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
468
Comments
2
Likes
35
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Intro to Apache Solr for Drupal

    1. 1. +INTRO TO APACHESOLR FOR DRUPAL
    2. 2. Presentation by Chris Capledrupal.org username: reallyordinaryhttp://drupal.org/user/791914Presented at May 30, 2011 Toronto Drupal usergroup meetup
    3. 3. WHAT IS APACHE SOLR?
    4. 4. • verypopular, extremely fast Java-based open source enterprise search platform from the Apache Lucene project• runsas a standalone full-text search server within a servlet container such as Tomcat• not an acronym - doesn’t stand for anything• powers the search and navigation features on many of the world’s largest sites
    5. 5. SITES LIKE...
    6. 6. • the White House • Zappos• AOL • SourceForge• eHarmony • Buy.com• Ticketmaster • the Internet Archive• GameSpot • Citysearch• The Guardian • eTrade• Netflix • Chowhound• CNET Reviews • Homestars.com
    7. 7. And of course... drupal.org
    8. 8. • so the point is - it’s great for large, high traffic sites• it’s heavy duty, internet-scale stuff• butit’ll also serve you well on smaller scale but ambitious Drupal sites
    9. 9. A BIT OF HISTORY
    10. 10. • initially developed by CNET Networks as in-house search platform in 2004 called “Solar”• CNET granted existing codebase to Apache Software Foundation in 2006 - name changed to “Solr”• in January 2007 Solr became a Lucene subproject• in March 2010, Solr and Lucene-java merged
    11. 11. WHAT IS APACHE LUCENE?
    12. 12. The Apache Lucene project develops open source search software, including:• Apache Lucene Core (formerly Lucene Java) - provides Java- based indexing and search, plus spellchecking, hit highlighting, and advanced analysis/tokenization capabilities• Apache Solr• Apache PyLucene - a Python port of Lucene Core• Apache Open Relevance Project - collects and distributes free materials for relevance testing & performance
    13. 13. LIMITATIONS OF DEFAULT DRUPAL SEARCH
    14. 14. • default Drupal search is decent for smaller sites• doesn’t deal well with large amounts of content (say 10k+ nodes) - doesn’t scale; gets bogged down• limited operators• integrated - it runs and searches directly on the same database• SQL was not designed as a searching language• “Relational Database Management Systems (RDBMS) are physically incapable of handling search well.”
    15. 15. • thereare several modules that enhance core search by providing stuff like faceted search and improved stemming• butthere’s no getting around its performance limitations and lack of scalability
    16. 16. BENEFITS OF USING SOLR
    17. 17. 1. Index and make searchable a really large amount of content - from 10k+ nodes up into the millions2. Provide faceted search-based navigation so users can find content faster & more intuitively, drilling down into content by date, author, tags, content type, & other attributes3. Provide search autocomplete, spelling suggestions, and content recommendations
    18. 18. 4. Provide a faster search experience than the default Drupal search is able to5. Give site visitors access to simple, easy to use advanced search features without confronting them with the “advanced search” page6. Provide users with the ability to do location-based search - to filter results by geographic location7. Expose all attributes of nodes to search
    19. 19. 8. Place search functions on a completely separate server Web server + PHP GET to SQL search POST to index database Solr server Diagram adapted from Robert Douglass’ 2008 slide set - see Resources
    20. 20. KEY SOLR FEATURES
    21. 21. • powerful full-text search • content recommendations• hit highlighting • rich document (ex: Word, PDF) handling• faceted search • geospatial search• dynamic clustering • allattributes of nodes are• relevance highlighting searchable• autocorrection • highly scalable• caching • can be run on a completely physically separate server• multi-site search
    22. 22. WHAT’S FACETED SEARCH?
    23. 23. • facetedsearch is dynamic clustering of items or search results into categories that let users drill down into search results (or even skip searching entirely) by any value in that field• eachfacet also shows the number of hits within the search that match that category• faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search
    24. 24. FACETED SEARCH EXAMPLEdiagram source: Lucid Imagination - http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
    25. 25. QUICK SOLR DEMOS ON LIVE DRUPAL SITES
    26. 26. Whitehouse.govDrupal.orgGoChicOrGoHome.comNew York Public Library
    27. 27. HOW DO YOU SET IT UP?
    28. 28. You’ll need:• Java 5 or higher• PHP 5.2 for Drupal 6, but PHP 5.1.4 will work if you have PECL JSON extension or Zend Framework JSON classes
    29. 29. 1. Go to the Apache Solr Search Integration project page http://drupal.org/project/apachesolr2. Install the module3. Grab the Solr PHP library via svn OR get the bundled Acquia Search download4. Enable the module5. Download Solr 1.4 and unpack outside of Drupal directory
    30. 30. 6. Rename the existing files apache-solr-nightly/example/solr/ conf/schema.xml and solrconfig.xml to *.bak to get them out of the way7. Copy schema.xml and solrconfig.xml that come with Apache Solr Drupal module to take their place8. Start Solr by opening a shell (Putty, Mac Terminal), going to the apache-solr-nightly/example folder, and executing command java -jar start.jar
    31. 31. 9. Test that Solr server is available at http://localhost:8983/solr/ admin10. Make sure both the main Apache Solr Framework and Apache Solr Search modules are enabled - if the Solr Search module isn’t enabled, no indexing will occur11. Run cron until your content is indexed12. Enable blocks for facets
    32. 32. PRO TIP 1:DISABLING CORE SEARCH INDEXER
    33. 33. • Apache Solr module depends on Drupal’s core Search module• when Solr is enabled, the Search module will also be enabled• as soon as the core Search module is enabled it starts to index all your nodes• this takes time to run and fills up the database (search_dataset, search_index... tables)
    34. 34. • if you’re installing Solr Search, you don’t need Drupal’s core search form• you replace it with the Solr one by going to the Solr module settings and clicking “Make Apache Solr Search the default”• this disables the core Search module’s form - but not the indexing
    35. 35. • to disable the indexing - and save some CPU cycles and database space - go to your site’s search settings at admin/ settings/search and set the “number of items to index per cron run” to 0 Thanks to DrupalCoder.com for this tip - http://www.drupalcoder.com/blog/performance-tip-disable-drupals-core-search-indexer-when-using-apache-solr
    36. 36. PRO TIP 2:CRON VS. ELYSIA CRON
    37. 37. • Solr Search indexing is triggered by cron runs• default Drupal cron job triggers all cron tasks at the same time• this can be a serious drag on performance and can cause cron runs to fail if one or more tasks doesn’t finish in the allotted cron period• to get around this, use...
    38. 38. • Elysia Cron - http://drupal.org/project/elysia_cron• expands cron capabilities - gives you crontab-like scheduling so you can run different tasks at different times and frequencies• so for example - set Solr Search to index 1000 nodes every 15 minutes, while other cron tasks are set to run once every hour
    39. 39. • to get fastest indexing on your server, experiment with different numbers of items to index per cron run and different cron run times until you find the max your server is capable of handling• ex: try indexing 1000 items per cron run and set the cron to run every 5 minutes• if you don’t get any errors, you’re good
    40. 40. DRUSH
    41. 41. • Solr Search integrates with Drush• you can call Solr tasks from the Drush command line• commands include...
    42. 42. • solr-delete-index Deletes the contents of the index. Can take content types as parameters• solr-index Send to Solr content marked for (re)indexing. Same as running cron once but without the other overhead• solr-reindex Marks content for reindexing. Can take content types as parameters• solr-search Search the site for keywords using Apache Solr
    43. 43. ACQUIA SEARCH
    44. 44. • Acquia has a hosted SaaS version of Solr that they call Acquia Search• it’s plug and play and available for Drupal 6 and 7• gives you all the power of Solr without having to install any software (beyond the Solr Drupal modules) or manage any servers• really easy to set up, really fast and robust, kind of pricey• http://acquia.com/products-services/acquia-search
    45. 45. • you can get a 30 day free trial of Acquia Search at http:// acquia.com/trial• easiest way to test drive Solr
    46. 46. SOLR + VIEWS 3 =THE (VERY NEAR) FUTURE
    47. 47. • this is where it starts to get even more interesting• Views 3 (still in alpha for Drupal 6 but in beta for Drupal 7) allows you to make custom searches against the Solr index the same way you currently make views against the MySQL database• ex: build a Solr search that just includes videos and MP3s and render the results as a playlist• ex: a Solr search that’s limited to the current user’s images, displayed as a slideshow
    48. 48. • upshot: you can bypass the Drupal database and build your content straight off the Solr index• no database queries• no complex views queries with tons of joins• no node_load() calls for displaying the results
    49. 49. RESOURCES
    50. 50. • best place to start learning is on the Solr Search docs page on drupal.org at - http://drupal.org/node/343467• Robert Douglass did a great Solr presentation in 2008 - slides are online at http://www.slideshare.net/robertDouglass/ apachesolr-presentation-from-do-it-with-drupal-presentation• the book “Solr 1.4 Enterprise Search Server” is apparently good - review here: http://www.drupalcoder.com/blog/book-review-from-a-drupal- point-of-view-solr-14-enterprise-search-server
    51. 51. • great article by Robert Douglass - “Views 3 + Apache Solr + Acquia Drupal = The Future of Search” http://acquia.com/blog/views-3-apache-solr-acquia-drupal- future-search• article - “Three things we learned from indexing a Drupal site with millions of nodes in Apache Solr” - http://www.drupalcoder.com/blog/three-things-we-learned- from-indexing-a-drupal-site-with-millions-of-nodes-in-apache- solr• article - “Geospatial Apache Solr searching in Drupal 6 by upgrading Solr to 3.1” - http://thedrupalblog.com/geospatial-apache-solr-searching- drupal-6-upgrading-solr-31
    52. 52. • how to install Solr on Mac OS X Snow Leopard - http://www.drupalcoder.com/blog/installing-apache-solr-in- tomcat-for-drupal-on-snow-leopard• setting up Drupal 6 with Apache Solr on Tomcat 6 and Ubuntu 9.10 - http://www.nickveenhof.be/blog/setup-drupal-6-apache-solr- tomcat-6-and-ubuntu-910-karmic-koala• Configuring Apache Solr Multi-core with Drupal and Tomcat on Ubuntu 9.10 - http://drupalconnect.com/blog/steve/configuring-apache-solr- multi-core-drupal-and-tomcat-ubuntu-910
    53. 53. • Jetty powered multicore Apache Solr and Drupal in Ubuntu 10.04 - http://vladgh.com/blog/jetty-powered-multicore-apache-solr- and-drupal-ubuntu-1004• Solr tutorials on the official Apache Solr site - http://lucene.apache.org/solr/tutorial.html• the official Apache Solr wiki - http://wiki.apache.org/solr/FrontPage• DrupalCamp Montreal 2009 video presentation on Solr - http://yadadrop.com/drupal-video/drupal-apache-solr-setup- configuration-extensions-hooks
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×