State-of-the-Art Drupal Search with Apache Solr
Upcoming SlideShare
Loading in...5
×
 

State-of-the-Art Drupal Search with Apache Solr

on

  • 14,966 views

These are the slides from the presentation I gave on Feb. 2, 2010, in Brussels, at the FOSDEM conference.

These are the slides from the presentation I gave on Feb. 2, 2010, in Brussels, at the FOSDEM conference.

Statistics

Views

Total Views
14,966
Views on SlideShare
13,928
Embed Views
1,038

Actions

Likes
15
Downloads
141
Comments
0

12 Embeds 1,038

http://robshouse.net 495
http://acquia.com 210
http://www.acquia.com 126
http://www.slideshare.net 84
https://www.acquia.com 62
http://drupaljam.nl 36
http://static.slidesharecdn.com 8
https://acquia.com 6
http://www.linkedin.com 4
http://www.drupaljam.nl 3
http://webcache.googleusercontent.com 3
http://translate.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This presentation was given on Sunday, February 7, 2010, in Brussels, at the FOSDEM conference.
  • Two years ago, at a conference called FOSDEM, the Apache Solr module was introduced. Coincidentally, it was the day I started working for Acquia.
  • It wasn’t really ready for prime time. In all I’d say it was 20% software and 80% vaporware.
  • I also had more hair back then, which was not only longer and thicker, but less grey.
  • A lot has changed.
  • Since then Acquia has launched a hosted search service based on Apache Solr: http://acquia.com/products-services/acquia-search
  • It’s been a big successs. Hosted by Acquia, using Amazon cloud architecture for great performance and high availability. If you want to try it all out and get up and running quickly, the Acquia Stack Installer and a free or basic subscription will get you there in around 5 minutes.
  • http://acquia.com/acquia-search <br /> http://acquia.com/downloads
  • Another important change is the funding and founding of Lucid Imagination. http://www.lucidimagination.com/About
  • Like Acquia, Lucid Imagination is a venture funded software company based in the Boston area. Like Acquia they have core committers on their team. <br /> Like Acquia they&#x2019;re in the business of providing support and services for their open source project.
  • They have succeeded in raising the profile and awareness of Solr, and have also accelerated the pace of Solr development. <br /> Lucid Imagination has also captured a lot of interest from government clients.
  • Acquia and Lucid Imagination represent two good reasons to choose Solr. It is a good long-term technology platform decision.
  • One of the big changes of the past two years has affected all of us quite strongly. For some time, Drupal.org has been running Solr as its main search component.
  • Around 50% of page requests to Drupal.org involve the Solr server in some way. These include <br /> site search, the project listings page, and the issue queue listings.
  • Here is a search page. Note the opportunities to sort, and filter on facets including content type and author.
  • Here&#x2019;s the modules listing page. It&#x2019;s especially helpful that you can filter on Drupal version type, project type, or do a keyword search that is limited just to modules.
  • Here&#x2019;s the issue queue advanced search. This too is powered by Solr.
  • The advantages of Drupal.org switching to Solr search include a much better search experience. There&#x2019;s faceting. There&#x2019;s better relevancy. Better performance. Better scalability.
  • Of course, one of the exciting announcements from last year was that President Obama&#x2019;s website, Whitehouse.gov, had switched to Drupal.
  • I bet you can guess that was quite happy about this.
  • And quite proud.
  • Proud of Drupal, and of Acquia,
  • and of the Obama administration for working to foster openness in government.
  • Two years ago the idea and concept of faceted search - the ability to easily drill down into search results - was new. Now everybody wants it, it&#x2019;s become a de facto standard in new projects.
  • Two years ago I promised we&#x2019;d get spelling suggestions. We have them, though we&#x2019;re still learning how to tune and improve them.
  • I will give you a tip, though: find solr/conf/spellings.txt and add problem words to it. The default file that comes with Solr only has the words &#x201C;pizza&#x201D; and &#x201C;history&#x201D;.
  • Someone here could make a contributed module to generate a spellings.txt. You could use taxonomy terms, vocabularies, synonyms, content types, user names, custom input forms, and even online dictionaries to generate the file.
  • Two years ago I promised more control over tuning search results. Now there are all sorts of boosting and customization options. You can exclude content types from the index. You can boost or reduce the importance of individual fields or HTML elements when searching.
  • This screen shows how you can use node attributes, like whether a node is promoted to the front page, or is sticky, to influence search rankings.
  • This screen shows how you can boost or diminish the ranking of individual content types, or exclude content types from being indexed altogether.
  • And this screen shows how you can use the HTML markup itself to give extra weight to some elements, or diminish the value of others.
  • Two years ago I promised the ability to do content recommendation with Solr. That is now a reality and it works really well, leading to far lower bounce rates and more time spent on your site
  • When Dries enabled content recommendation on http://Buytaert.net I spent about two hours re-discovering things that he&#x2019;d written over the years. Every article had more context and background. You can always find something related and interesting to read.
  • And with this pending patch, you have even greater control over content recommendation. You can limit the recommendations to certain content types, or certain taxonomy terms, or boost certain words. You can make as many different recommendation blocks as you need.
  • We now have Views 3, File search, multisite search, comment search, cck date facets, statistics, autocomplete, and the glorious display suite.
  • Views 3 builds custom solr queries (using the apachesolr_views module). <br /> Views 3 then displays query results, with all the Views goodies you&#x2019;re familiar with (tables, grids, carousels, slideshows etc.) <br /> You can build custom search forms with exposed filters, and faceting works as well.
  • Here you can see an example view that has an exposed keyword filter, a taxonomy facet block, and a table display including a sortable title column. All made using the views user interface.
  • Read more about it at acquia dot com, node nine one one six six seven. http://acquia.com/node/911667
  • It&#x2019;s possible to search for text inside of uploaded files as well. There&#x2019;s a brand new blog post on Acquia.com about this topic that I published today. http://acquia.com/node/1129446
  • I searched for &#x201C;merlinofchaos&#x201D; and the text was found in a the zipped up tarball of Views3 which I attached to a node using a filefield.
  • Here are examples of the word &#x201C;Drupal&#x201D; being found in a Microsoft Word file and this very Keynote presentation that you&#x2019;re watching.
  • Multisite search: sites share an index and can be filtered on the &#x201C;Site&#x201D; facet. <br /> You can either search across all your sites, or on the site that you&#x2019;re currently on.
  • Comment search: Comments get indexed as 1st class citizens and get their own search results. Even solved the problem of linking to a comment on page X. This is only available in the DRUPAL-6--2 branch of ApacheSolr.
  • CCK Date Facets: Any CCK date field becomes a facet filter. You can drill down - year - month - day - hour.
  • This is a patch, currently, and needs testing. Please help. http://drupal.org/node/558160
  • Too few people analyze what&#x2019;s happening on their site regarding search. When you look at normal analytics you see a lot of incoming links and keyword searches from sites like Google. Do you, however, analyze the keywords that people use on your own site search? This is golden information as they&#x2019;re telling you exactly what they&#x2019;re looking for. Do you retrace their steps and look at the search results they see? Is it what you&#x2019;d expect? Are they finding what they&#x2019;re looking for?
  • The statistics module gives you insight to how many searches, search performance (.0001 second average - .543 sec maximum)
  • And also which search filters are most often being applied.
  • Here you can see how many searches from a music site are being filtered by genre or instrumentation.
  • Ready for use. Go try it out.
  • Autocomplete uses the contents of the index to suggest terms to you, as you type. If you finish one term it will suggest a common second term to go with it.
  • Just a few days ago the display suite 1.0 was released. This is now the easiest way to customize your search results.
  • Here, for example, are search results with full teasers and images.
  • Solr 1.5 is just around the corner. The team at Lucid has set a brisk pace. It will include native geospatial seaerch. It will have an autocomplete request handler. It will have the very exciting eDismax request handler - full lucene syntax and perhaps opportunity for Lucene API module integration. (http://drupal.org/project/luceneapi)
  • People often want to search non-Drupal sites along with their Drupal sites. This can be done by crawling those sites with Nutch and using the Nutch/Solr integration. <br /> Acquia support for multisite, file and geospatial search are on the way. <br /> (In the presentation I asked a show of hands for which of these three features would be most popular: People responded 1 - File, 2 - Multisite, and 3 - Geospatial in that order.
  • There are urgent needs and immediate opportunities to help. #1 on the list is to assist with the Drupal.org redesign and relaunch. http://drupal.org/node/704062 <br /> Much of the new site is driven by Solr #2 is the glaring absence of test suites. This slows development due to regressions. #3 is the upgrade to Drupal 7.
  • Like any two year long party, there&#x2019;s a bit of of housecleaning that now has to be done. Better APIs, more abstraction, better documentation.

State-of-the-Art Drupal Search with Apache Solr State-of-the-Art Drupal Search with Apache Solr Presentation Transcript

  • Apache Solr Robert Douglass, Acquia
  • Anniversary • two years ago... • at a conference called FOSDEM... • the Apache Solr module was introduced.
  • Anniversary • it was 20% software .... • and 80% vaporware.
  • Anniversary
  • A lot has changed.
  • A lot has changed • Acquia Search
  • Acquia Search • Acquia’s hosted service - big success • In combination with Acquia Stack Installer and a trial or basic subscription, 5 minutes to install and have Solr search running.
  • Acquia Search • http://acquia.com/acquia-search • http://acquia.com/downloads
  • A lot has changed • Acquia Search • Lucid Imagination
  • Lucid Imagination • Venture funded Boston-based company (just like Acquia) • Core Solr committers (just like Acquia) • Solr support and services (very much like Acquia)
  • Lucid Imagination • Accelerating pace of Solr development (just like Acquia) • Big interest from government clients (just like Acquia)
  • Acquia and Lucid Imagination Two good reasons to choose Drupal and Solr as long-term technology platforms. http://acquia.com http://lucidimagination.com
  • A lot has changed • Acquia Search • Lucid Imagination • Drupal.org
  • Drupal.org • ~50% of page requests get main content from Solr • site search • project listing pages (modules, themes, etc) • Issue queue listings
  • Drupal.org
  • Drupal.org
  • Drupal.org
  • Drupal.org • Vastly improved search experience • Faceting • Better relevancy • Better performance • Better scalability
  • A lot has changed • Acquia Search • Lucid Imagination • Drupal.org • Whitehouse.gov
  • Whitehouse.gov
  • Whitehouse.gov
  • Whitehouse.gov
  • Whitehouse.gov
  • Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
  • Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
  • Improving spelling suggestions • In your solr/conf directory, look into the file called spellings.txt • It has two words in it: “pizza” “history” • Tip: Get or build a dictionary to fill that file with correctly spelled words.
  • Improving spelling suggestions • Use taxonomy terms, vocabularies and synonyms • Use content types • Use user names • Use online dictionaries
  • Improving spelling suggestions An idea for a contributed module • Use taxonomy terms, vocabularies and synonyms • Use content types • Use user names • Use online dictionaries
  • Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
  • Some dreams reality
  • Some dreams reality
  • Some dreams reality
  • Some dreams are now reality • Faceted search: Was new ... is now household word • Spelling suggestions • Field boosting • Content recommendation
  • Some dreams reality
  • Some dreams reality
  • Some dreams reality Greater control over recommendations: A patch to review: http://drupal.org/node/372767
  • 2 Years of Pure Party • ApacheSolr Views Statistics • File search • • Multisite search • Autocomplete • Comment search • Display Suite • CCK Date facets
  • Apache Solr + Views 3 • Views builds Solr query • Views displays query results • Build custom search forms with exposed filters • Faceting works as well
  • Apache Solr + Views 3
  • Apache Solr + Views 3 http://acquia.com/node/911667
  • File Search New blog post: http://robshouse.net/blog-post/use-apache-solr-search-files http://acquia.com/blog/use-apache-solr-search-files or http://acquia.com/node/1129446
  • File Search
  • File Search
  • Multisite Search
  • Comment Search
  • Comment Search
  • CCK Date Facets
  • CCK Date Facets Help test cck date facets: http://drupal.org/node/558160
  • Statistics http://drupal.org/project/apachesolr_stats
  • Statistics
  • Statistics
  • Statistics
  • Autocomplete http://drupal.org/project/apachesolr_autocomplete
  • Autocomplete
  • Display Suite http://drupal.org/project/ds
  • Display Suite
  • Stuff on the horizon • Solr 1.5 features • GeoSpatial search • Autocomplete component • eDismax (Extended dismax) - supports raw Lucene syntax, among other thigngs. Opens the door for integration with Lucene API module. (http://drupal.org/ project/luceneapi)
  • Stuff on the horizon • Crawling with Nutch • Acquia support for multisite, file and geospatial search
  • Urgent needs • Drupal.org relaunch http://drupal.org/node/704062 • Test suites • Drupal 7 version
  • House cleaning • Help us refactor • Better APIs • Better Documentation
  • Any Questions?