State-of-the-Art Drupal Search with Apache Solr


Published on

These are the slides from the presentation I gave on Feb. 2, 2010, in Brussels, at the FOSDEM conference.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This presentation was given on Sunday, February 7, 2010, in Brussels, at the FOSDEM conference.
  • Two years ago, at a conference called FOSDEM, the Apache Solr module was introduced. Coincidentally, it was the day I started working for Acquia.
  • It wasn’t really ready for prime time. In all I’d say it was 20% software and 80% vaporware.
  • I also had more hair back then, which was not only longer and thicker, but less grey.
  • A lot has changed.
  • Since then Acquia has launched a hosted search service based on Apache Solr:
  • It’s been a big successs. Hosted by Acquia, using Amazon cloud architecture for great performance and high availability. If you want to try it all out and get up and running quickly, the Acquia Stack Installer and a free or basic subscription will get you there in around 5 minutes.
  • Another important change is the funding and founding of Lucid Imagination.
  • Like Acquia, Lucid Imagination is a venture funded software company based in the Boston area. Like Acquia they have core committers on their team.
    Like Acquia they’re in the business of providing support and services for their open source project.
  • They have succeeded in raising the profile and awareness of Solr, and have also accelerated the pace of Solr development.
    Lucid Imagination has also captured a lot of interest from government clients.
  • Acquia and Lucid Imagination represent two good reasons to choose Solr. It is a good long-term technology platform decision.
  • One of the big changes of the past two years has affected all of us quite strongly. For some time, has been running Solr as its main search component.
  • Around 50% of page requests to involve the Solr server in some way. These include
    site search, the project listings page, and the issue queue listings.
  • Here is a search page. Note the opportunities to sort, and filter on facets including content type and author.
  • Here’s the modules listing page. It’s especially helpful that you can filter on Drupal version type, project type, or do a keyword search that is limited just to modules.
  • Here’s the issue queue advanced search. This too is powered by Solr.
  • The advantages of switching to Solr search include a much better search experience. There’s faceting. There’s better relevancy. Better performance. Better scalability.
  • Of course, one of the exciting announcements from last year was that President Obama’s website,, had switched to Drupal.
  • I bet you can guess that was quite happy about this.
  • And quite proud.
  • Proud of Drupal, and of Acquia,
  • and of the Obama administration for working to foster openness in government.
  • Two years ago the idea and concept of faceted search - the ability to easily drill down into search results - was new. Now everybody wants it, it’s become a de facto standard in new projects.
  • Two years ago I promised we’d get spelling suggestions. We have them, though we’re still learning how to tune and improve them.
  • I will give you a tip, though: find solr/conf/spellings.txt and add problem words to it. The default file that comes with Solr only has the words “pizza” and “history”.
  • Someone here could make a contributed module to generate a spellings.txt. You could use taxonomy terms, vocabularies, synonyms, content types, user names, custom input forms, and even online dictionaries to generate the file.
  • Two years ago I promised more control over tuning search results. Now there are all sorts of boosting and customization options. You can exclude content types from the index. You can boost or reduce the importance of individual fields or HTML elements when searching.
  • This screen shows how you can use node attributes, like whether a node is promoted to the front page, or is sticky, to influence search rankings.
  • This screen shows how you can boost or diminish the ranking of individual content types, or exclude content types from being indexed altogether.
  • And this screen shows how you can use the HTML markup itself to give extra weight to some elements, or diminish the value of others.
  • Two years ago I promised the ability to do content recommendation with Solr. That is now a reality and it works really well, leading to far lower bounce rates and more time spent on your site
  • When Dries enabled content recommendation on I spent about two hours re-discovering things that he’d written over the years. Every article had more context and background. You can always find something related and interesting to read.
  • And with this pending patch, you have even greater control over content recommendation. You can limit the recommendations to certain content types, or certain taxonomy terms, or boost certain words. You can make as many different recommendation blocks as you need.
  • We now have Views 3, File search, multisite search, comment search, cck date facets, statistics, autocomplete, and the glorious display suite.
  • Views 3 builds custom solr queries (using the apachesolr_views module).
    Views 3 then displays query results, with all the Views goodies you’re familiar with (tables, grids, carousels, slideshows etc.)
    You can build custom search forms with exposed filters, and faceting works as well.
  • Here you can see an example view that has an exposed keyword filter, a taxonomy facet block, and a table display including a sortable title column. All made using the views user interface.
  • Read more about it at acquia dot com, node nine one one six six seven.
  • It’s possible to search for text inside of uploaded files as well. There’s a brand new blog post on about this topic that I published today.
  • I searched for “merlinofchaos” and the text was found in a the zipped up tarball of Views3 which I attached to a node using a filefield.
  • Here are examples of the word “Drupal” being found in a Microsoft Word file and this very Keynote presentation that you’re watching.
  • Multisite search: sites share an index and can be filtered on the “Site” facet.
    You can either search across all your sites, or on the site that you’re currently on.
  • Comment search: Comments get indexed as 1st class citizens and get their own search results. Even solved the problem of linking to a comment on page X. This is only available in the DRUPAL-6--2 branch of ApacheSolr.
  • CCK Date Facets: Any CCK date field becomes a facet filter. You can drill down - year - month - day - hour.
  • This is a patch, currently, and needs testing. Please help.
  • Too few people analyze what’s happening on their site regarding search. When you look at normal analytics you see a lot of incoming links and keyword searches from sites like Google. Do you, however, analyze the keywords that people use on your own site search? This is golden information as they’re telling you exactly what they’re looking for. Do you retrace their steps and look at the search results they see? Is it what you’d expect? Are they finding what they’re looking for?
  • The statistics module gives you insight to how many searches, search performance (.0001 second average - .543 sec maximum)
  • And also which search filters are most often being applied.
  • Here you can see how many searches from a music site are being filtered by genre or instrumentation.
  • Ready for use. Go try it out.
  • Autocomplete uses the contents of the index to suggest terms to you, as you type. If you finish one term it will suggest a common second term to go with it.
  • Just a few days ago the display suite 1.0 was released. This is now the easiest way to customize your search results.
  • Here, for example, are search results with full teasers and images.
  • Solr 1.5 is just around the corner. The team at Lucid has set a brisk pace. It will include native geospatial seaerch. It will have an autocomplete request handler. It will have the very exciting eDismax request handler - full lucene syntax and perhaps opportunity for Lucene API module integration. (
  • People often want to search non-Drupal sites along with their Drupal sites. This can be done by crawling those sites with Nutch and using the Nutch/Solr integration.
    Acquia support for multisite, file and geospatial search are on the way.
    (In the presentation I asked a show of hands for which of these three features would be most popular: People responded 1 - File, 2 - Multisite, and 3 - Geospatial in that order.
  • There are urgent needs and immediate opportunities to help. #1 on the list is to assist with the redesign and relaunch.
    Much of the new site is driven by Solr #2 is the glaring absence of test suites. This slows development due to regressions. #3 is the upgrade to Drupal 7.
  • Like any two year long party, there’s a bit of of housecleaning that now has to be done. Better APIs, more abstraction, better documentation.
  • State-of-the-Art Drupal Search with Apache Solr

    1. Apache Solr Robert Douglass, Acquia
    2. Anniversary • two years ago... • at a conference called FOSDEM... • the Apache Solr module was introduced.
    3. Anniversary • it was 20% software .... • and 80% vaporware.
    4. Anniversary
    5. A lot has changed.
    6. A lot has changed • Acquia Search
    7. Acquia Search • Acquia’s hosted service - big success • In combination with Acquia Stack Installer and a trial or basic subscription, 5 minutes to install and have Solr search running.
    8. Acquia Search • •
    9. A lot has changed • Acquia Search • Lucid Imagination
    10. Lucid Imagination • Venture funded Boston-based company (just like Acquia) • Core Solr committers (just like Acquia) • Solr support and services (very much like Acquia)
    11. Lucid Imagination • Accelerating pace of Solr development (just like Acquia) • Big interest from government clients (just like Acquia)
    12. Acquia and Lucid Imagination Two good reasons to choose Drupal and Solr as long-term technology platforms.
    13. A lot has changed • Acquia Search • Lucid Imagination •
    14. • ~50% of page requests get main content from Solr • site search • project listing pages (modules, themes, etc) • Issue queue listings
    18. • Vastly improved search experience • Faceting • Better relevancy • Better performance • Better scalability
    19. A lot has changed • Acquia Search • Lucid Imagination • •
    24. Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
    25. Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
    26. Improving spelling suggestions • In your solr/conf directory, look into the file called spellings.txt • It has two words in it: “pizza” “history” • Tip: Get or build a dictionary to fill that file with correctly spelled words.
    27. Improving spelling suggestions • Use taxonomy terms, vocabularies and synonyms • Use content types • Use user names • Use online dictionaries
    28. Improving spelling suggestions An idea for a contributed module • Use taxonomy terms, vocabularies and synonyms • Use content types • Use user names • Use online dictionaries
    29. Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
    30. Some dreams reality
    31. Some dreams reality
    32. Some dreams reality
    33. Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
    34. Some dreams reality
    35. Some dreams reality
    36. Some dreams reality Greater control over recommendations: A patch to review:
    37. 2 Years of Pure Party • ApacheSolr Views Statistics • File search • • Multisite search • Autocomplete • Comment search • Display Suite • CCK Date facets
    38. Apache Solr + Views 3 • Views builds Solr query • Views displays query results • Build custom search forms with exposed filters • Faceting works as well
    39. Apache Solr + Views 3
    40. Apache Solr + Views 3
    41. File Search New blog post: or
    42. File Search
    43. File Search
    44. Multisite Search
    45. Comment Search
    46. Comment Search
    47. CCK Date Facets
    48. CCK Date Facets Help test cck date facets:
    49. Statistics
    50. Statistics
    51. Statistics
    52. Statistics
    53. Autocomplete
    54. Autocomplete
    55. Display Suite
    56. Display Suite
    57. Stuff on the horizon • Solr 1.5 features • GeoSpatial search • Autocomplete component • eDismax (Extended dismax) - supports raw Lucene syntax, among other thigngs. Opens the door for integration with Lucene API module. ( project/luceneapi)
    58. Stuff on the horizon • Crawling with Nutch • Acquia support for multisite, file and geospatial search
    59. Urgent needs • relaunch • Test suites • Drupal 7 version
    60. House cleaning • Help us refactor • Better APIs • Better Documentation
    61. Any Questions?