0
Intro to Solr
DrupalConPortland
Andrew RileyDirector of Drupal Development@andrewmriley
AgendaSearch?WhySolr?SearchingBehindtheScenes
Search?
What is Search?Search (v): to go or look through (a place, area, etc.)carefully in order to find something missing or lost...
Why Users Search• Navigation doesnt make sense• It can be faster• Lots of data• Frequent data changes• Might just be looki...
Search Problems• Search accuracy• Too much data• Slow response• Wrong results@Mediacurrent
WhySolr?
HistorySolr was initially created in 2004 as an in-houseproject for CNET. It was open sourced in 2006 anddonated to the Ap...
Lucene• Solr is a layer on top of Lucene• Lucene is a library• Solr stores files in Lucene format*http://wiki.apache.org/s...
SpeedSearch speed is important!@Mediacurrent
SpeedSource: Web Performance Today http://j.mp/12h8wLZ@Mediacurrent
Speed• Important!• It scales well• No database required• Clustering & Sharding• Netflix runs 1.2MM q/day on 4 servers**htt...
Natural Results• Stemming: Blogging vs. Blog• Stop Word Removal: The• Synonyms: Tissue vs Kleenex• Highly Configurable@Med...
Drupal Search• Not stemmed by default• Queries the database• Stores tokenized words in a single largetable• Much slower to...
VS@Mediacurrent
Searching
Ordering• Score• Comes from Lucene• Not "out of 100"• Bigger score firstMore Info: http://lucene.apache.org/core/3_6_1/sco...
Facets• Users do the work• Fixes too much data• Native to Solr• Requires the Facet APImodule• Shopping Sites@Mediacurrent
Behind theScenes
Index?• Index contains Documents• Documents have Fields• Fields have Terms• ~2 minutes for updates• Uses Lucene syntax@Med...
Tokenizing• Splits words and numbers"this" "is" "blogging"• Excludes Stopwords"this" "blogging"• Handles Stemming (if enab...
Bias• Adjusts the order of search results• Works on: Content Type, Fields,Comments, Promoted to Home Page andmore• Can be ...
Recap
Modules• Apache Solr (apachesolr)• Facet API (facetapi)• Chaos tool suite (ctools)@Mediacurrent
Overall• Search is becoming more and moreimportant• You want to control your search results• If you dont provide a good se...
Thank You!Questions?@Mediacurrent Mediacurrent.comandrew.riley@mediacurrent.com@andrewmrileyslideshare.net/mediacurrent
Intro to Solr in Drupal
Upcoming SlideShare
Loading in...5
×

Intro to Solr in Drupal

503

Published on

Does your website have a ton of data? How do your users find the relevant pages among all the noise in your site?

Solr can help deliver the pertinent search results to your users regardless of your site's size.

Apache Solr is a Java program that integrates with the Drupal contrib module that allows your users to quickly search millions of records and narrow down the results with minimal system impact.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
503
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In this example Walmart found that conversion rates were directly affected by site load times. While this example is for sites it still applies to search.
  • Transcript of "Intro to Solr in Drupal "

    1. 1. Intro to Solr
    2. 2. DrupalConPortland
    3. 3. Andrew RileyDirector of Drupal Development@andrewmriley
    4. 4. AgendaSearch?WhySolr?SearchingBehindtheScenes
    5. 5. Search?
    6. 6. What is Search?Search (v): to go or look through (a place, area, etc.)carefully in order to find something missing or lost: Isearched the desk for the letter.Source: http://dictionary.reference.com/browse/search@Mediacurrent
    7. 7. Why Users Search• Navigation doesnt make sense• It can be faster• Lots of data• Frequent data changes• Might just be looking for something@Mediacurrent
    8. 8. Search Problems• Search accuracy• Too much data• Slow response• Wrong results@Mediacurrent
    9. 9. WhySolr?
    10. 10. HistorySolr was initially created in 2004 as an in-houseproject for CNET. It was open sourced in 2006 anddonated to the Apache Software Foundation.@Mediacurrent
    11. 11. Lucene• Solr is a layer on top of Lucene• Lucene is a library• Solr stores files in Lucene format*http://wiki.apache.org/solr/SolrPerformanceData@Mediacurrent
    12. 12. SpeedSearch speed is important!@Mediacurrent
    13. 13. SpeedSource: Web Performance Today http://j.mp/12h8wLZ@Mediacurrent
    14. 14. Speed• Important!• It scales well• No database required• Clustering & Sharding• Netflix runs 1.2MM q/day on 4 servers**http://wiki.apache.org/solr/SolrPerformanceData@Mediacurrent
    15. 15. Natural Results• Stemming: Blogging vs. Blog• Stop Word Removal: The• Synonyms: Tissue vs Kleenex• Highly Configurable@Mediacurrent
    16. 16. Drupal Search• Not stemmed by default• Queries the database• Stores tokenized words in a single largetable• Much slower to index@Mediacurrent
    17. 17. VS@Mediacurrent
    18. 18. Searching
    19. 19. Ordering• Score• Comes from Lucene• Not "out of 100"• Bigger score firstMore Info: http://lucene.apache.org/core/3_6_1/scoring.html???201200199184@Mediacurrent
    20. 20. Facets• Users do the work• Fixes too much data• Native to Solr• Requires the Facet APImodule• Shopping Sites@Mediacurrent
    21. 21. Behind theScenes
    22. 22. Index?• Index contains Documents• Documents have Fields• Fields have Terms• ~2 minutes for updates• Uses Lucene syntax@Mediacurrent
    23. 23. Tokenizing• Splits words and numbers"this" "is" "blogging"• Excludes Stopwords"this" "blogging"• Handles Stemming (if enabled)"this" "blog"• Very configurable@Mediacurrent
    24. 24. Bias• Adjusts the order of search results• Works on: Content Type, Fields,Comments, Promoted to Home Page andmore• Can be dynamic with custom modules.@Mediacurrent
    25. 25. Recap
    26. 26. Modules• Apache Solr (apachesolr)• Facet API (facetapi)• Chaos tool suite (ctools)@Mediacurrent
    27. 27. Overall• Search is becoming more and moreimportant• You want to control your search results• If you dont provide a good searchexperience, somebody else will.• Solr doesnt have to be complex.• Solr is fast and scales.@Mediacurrent
    28. 28. Thank You!Questions?@Mediacurrent Mediacurrent.comandrew.riley@mediacurrent.com@andrewmrileyslideshare.net/mediacurrent
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×