Intro to Solr in Drupal


Published on

Does your website have a ton of data? How do your users find the relevant pages among all the noise in your site?

Solr can help deliver the pertinent search results to your users regardless of your site's size.

Apache Solr is a Java program that integrates with the Drupal contrib module that allows your users to quickly search millions of records and narrow down the results with minimal system impact.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • In this example Walmart found that conversion rates were directly affected by site load times. While this example is for sites it still applies to search.
  • Intro to Solr in Drupal

    1. 1. Intro to Solr
    2. 2. DrupalConPortland
    3. 3. Andrew RileyDirector of Drupal Development@andrewmriley
    4. 4. AgendaSearch?WhySolr?SearchingBehindtheScenes
    5. 5. Search?
    6. 6. What is Search?Search (v): to go or look through (a place, area, etc.)carefully in order to find something missing or lost: Isearched the desk for the letter.Source:
    7. 7. Why Users Search• Navigation doesnt make sense• It can be faster• Lots of data• Frequent data changes• Might just be looking for something@Mediacurrent
    8. 8. Search Problems• Search accuracy• Too much data• Slow response• Wrong results@Mediacurrent
    9. 9. WhySolr?
    10. 10. HistorySolr was initially created in 2004 as an in-houseproject for CNET. It was open sourced in 2006 anddonated to the Apache Software Foundation.@Mediacurrent
    11. 11. Lucene• Solr is a layer on top of Lucene• Lucene is a library• Solr stores files in Lucene format*
    12. 12. SpeedSearch speed is important!@Mediacurrent
    13. 13. SpeedSource: Web Performance Today
    14. 14. Speed• Important!• It scales well• No database required• Clustering & Sharding• Netflix runs 1.2MM q/day on 4 servers**
    15. 15. Natural Results• Stemming: Blogging vs. Blog• Stop Word Removal: The• Synonyms: Tissue vs Kleenex• Highly Configurable@Mediacurrent
    16. 16. Drupal Search• Not stemmed by default• Queries the database• Stores tokenized words in a single largetable• Much slower to index@Mediacurrent
    17. 17. VS@Mediacurrent
    18. 18. Searching
    19. 19. Ordering• Score• Comes from Lucene• Not "out of 100"• Bigger score firstMore Info:
    20. 20. Facets• Users do the work• Fixes too much data• Native to Solr• Requires the Facet APImodule• Shopping Sites@Mediacurrent
    21. 21. Behind theScenes
    22. 22. Index?• Index contains Documents• Documents have Fields• Fields have Terms• ~2 minutes for updates• Uses Lucene syntax@Mediacurrent
    23. 23. Tokenizing• Splits words and numbers"this" "is" "blogging"• Excludes Stopwords"this" "blogging"• Handles Stemming (if enabled)"this" "blog"• Very configurable@Mediacurrent
    24. 24. Bias• Adjusts the order of search results• Works on: Content Type, Fields,Comments, Promoted to Home Page andmore• Can be dynamic with custom modules.@Mediacurrent
    25. 25. Recap
    26. 26. Modules• Apache Solr (apachesolr)• Facet API (facetapi)• Chaos tool suite (ctools)@Mediacurrent
    27. 27. Overall• Search is becoming more and moreimportant• You want to control your search results• If you dont provide a good searchexperience, somebody else will.• Solr doesnt have to be complex.• Solr is fast and scales.@Mediacurrent
    28. 28. Thank You!Questions?@Mediacurrent