• Like
  • Save

Making your Drupal fly with Apache SOLR

  • 27,247 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
27,247
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
0
Comments
0
Likes
20

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Making your Drupal fly with Apache SOLR
    • Kalle Virta, Exove
  • In this presentation
    About Exove and myself
    The problem – and the solution (and some cowboys)
    SOLR to do the site-wide search
    SOLR to help with Views
    SOLR to help with custom modules
    And the Fine Print
  • 2.
  • 3. We deliver business-driven web services that enable our customers to conduct better business on the Internet
    We base our work to our customers’ strategy and needs
  • 4.
  • 5. About me, Kalle Virta
    Software architect and developer
    High performance and complex integrations
    Almost 10 years in the business
    Seen Drupal from version 3
    A lot of big Drupal sites / systems under by belt
  • 6. Your regular stack
    MySQL
    server
    Linux + Apache
  • 7. Damn, dude
    your
    MySQL server
    FIRE
    is on
  • 8. New guys to
    the rescue
  • 9. Apache SOLR
    memcached
    Varnish
  • 10. Your enhanced stack
    mem-cached
    MySQL
    server
    Linux + Apache
    Varnish
    Apache
    SOLR
    Did you notice?
    It’s still blue.
  • 11. The new guys
    Varnish is a http cache and does it well – but it doesn’t help at all on your customized-for-every-person social media site
    Memcached is a good idea, and you can even use it with cache router to cache Drupal stuff, including your own modules, but… it still just caches stuff
    SOLR however, is a different story…
  • 12. SOLR
    Apache SOLR is a search server around Lucene (which is a search library) written in Java
    It needs a Java container, e.g. Jetty or Tomcat
    In a simple way, you can save your stuff in XML form in it and then search from them
    SOLR will tokenize and do all kinds of (configurable) magic to the data when indexing it, but it can also store the original data (not always possible with search indexers)
  • 13. SOLR for searching
    Obviously all the features of SOLR make it optimal for sitewide searching functionality
    You can actually find stuff with SOLR, all the fields in the search can be biased, that is, you can tune the fields in which the hits make the score go higher
    SOLR also does one really neat thing for searching…
  • 14. ?
    Ever heard
    of a
    faceted
    SearCH
  • 15. The old advanced search
    Search
    mouse
    Product category
    Product sub-cat
    Manufacturer
    Price range
    -
    Search
    Too many search results (794),
    narrow your search and try again
  • 16. The faceted search
    Order by price
    Logitech LS1 Laser Mouse
    Current search
    29 €
    A cheap laser mouse that’ll get you
    through even the most problematic
    of PowerPoint presentations.
    mouse
    Sub-category
    Logitech G3 Gaming Mouse
    wireless mice (296)
    wired mice (96)
    laser mice (163)
    59 €
    A great laser mouse with more
    buttons than you’ll ever have
    time to configure. A steal.
    Show all
    Microsoft Super Mouse
    Manufacturer
    49 €
    Logitech(194)
    Microsoft (36)
    HP (3)
    A great mouse from the company
    that brought you the best product
    of all times, Windows Me.
    Show all
    Apple Mighty Mouse
    129 €
    Price range
    The mouse the image happens
    to be of. Never tried it. Looks
    pretty nice, though.
    0-50 € (384)
    50-100 € (129)
    100-300 € (50)
    page 1 2 3 4 5 6 7 8 9 10
  • 17. SOLR for faceted searching
    Apache SOLR let’s you facet search results – that is, to show possible search filters and give counts for them
    Faceting with SOLR can also be achieved in Drupal – and now a Drupal contrib module comes to play
    With ApacheSOLR –module (http://drupal.org/project/apachesolr) you can do all this with a couple of clicks in your Drupal installation
  • 18. SOLRfy your Drupal search 1/3
    Download SOLR package from http://www.apache.org/dyn/closer.cgi/lucene/solr/
    Unpackage it and check your server’s firewall settings to allow traffic to port 8983
    Check that you have Java (RE) installed
  • 19. SOLRfy your Drupal search 2/3
    Then get Drupal’s “apachesolr” module, there’s two xml files in the package, solrconfig.xml and schema.xml
    Go back to your SOLR directory, rename example directory to “drupal” so you’ll find it easier
    Drop the two xml files to that drupal/solr/conf –directory
    Go to that drupal directory and fire up Apache SOLR with “java –jar start.jar”
  • 20. SOLRfy your Drupal search 3/3
    Now you can turn on “apachesolr” module in Drupal
    Tune the SOLR server settings in Drupal, reindex all content and then start clicking on those filtering/faceting settings on apachesolr
    You’ll have to turn the facets on as blocks
    But your search experience will be something else entirely
    …and once you see how searching with SOLR works, you’re not going back
  • 21. Apachesolr -module
    Automatically creates facets for taxonomy terms, for every vocabulary – you can just turn them on
    Automatically creates facets for CCK fields using dropdown/radio widgets (i.e. with a set of options)
    Exposes hooks for CCK fields (to make facets out of them)
    Exposes hook for altering the query (to some extent)
    Easy to use
  • 22. Faceting without SOLR
    You can do faceting without SOLR too
    “Faceted search” module will do it for you
    But at only 10K nodes, SOLR is three times as fast
    With 100K+ nodes, faceted search without SOLR is practically unusable
    …but for small sites, SOLR is not necessary for faceting
  • 23. SEARCH
    So you can
    with SOLR …but my site does
    A LOT more
  • 24. SOLRify the rest of your Drupal universe
    You probably know your performance problems on your site
    If it’s somehow personalized, you usually can’t do anything about it with caching
    How about using SOLR for it?
    Apache Solr Views –module (at a very mature “dev” state ;) and Views 3 (dev too) will talk together and integrate to apachesolr –module and it’s SOLR index
    When this is stable and fully functional…
  • 25. It’ll make your Views
    FLY
  • 26. SELECT
    title,
    description,
    mediatype
    FROM media
    LEFT JOIN
    media_types
    ON
    media_type_id = type_id
    LEFT JOIN
    media_tag
    ON
    media_tag.mid = media.id
    WHERE
    name LIKE ‘%s’
    OR
    description LIKE ‘%s’
    OR media.id
    IN (SELECT mid FROM promoted_media)
    But my problems are in my
    custom modules
  • 27. Custom modules
    Custom modules can be designed with ApacheSOLR in mind
    When you realize all the potential there is in a indexer that can index XML files, sky is the limit
    Whenever you have a data structure that’s too complex for MySQL to search from – and that’s not too rarely – you might benefit from indexing that data to SOLR and using your SOLR as the read-only “db”
  • 28. Custom modules – making SOLR do the reading
    media_workflow
    media_tag
    A single “row” for SOLR to index
    media
    media_revision
    tag
    media_version
    files
  • 29. Custom modules – making SOLR do the reading
    You know you need a better structure when you can’t circumvent running LEFT JOIN or subqueries – and running them gets too slow
    When you’ve optimized your code several times and restructuring your database would mean creating a read-optimized cache of everything
    Then SOLR might be just the thing to get you through
  • 30. Custom modules – making SOLR do the reading
    MySQL
    server
    Write
    Index
    Apache
    SOLR
    Read
  • 31. Libraries to use with custom modules
    Apachesolr –module uses a SOLR library written in PHP and licensed in New BSD (http://code.google.com/p/solr-php-client/)
    There’s also a PECL extension, but I’m not aware of any speed comparisons
    There are also contrib Drupal modules that give you an API for accessing SOLR
  • 32. magic
    It’s no
    bullet
  • 33. Not a magic bullet 1/2
    Apache SOLR is a hassle with all the java containers and such, you’ll probably have to run it on a separate server
    You should always run stuff through Drupal or a script that will authenticate and authorize calls to SOLR (SOLR shouldn’t be exposed, unless all the data is public)
    Sometimes the extra server might be better to use on an extra MySQL node
    Sometimes you can just fix your stuff and make it as fast as it would be on Apache SOLR
  • 34. Not a magic bullet 2/2
    And then there’s the fact SOLR is build mainly for the English language
    So make sure SOLR will do what you want for you in the language you want it to do it in
  • 35. Recap
    SOLR will right now give your Drupal site a fast, faceted search with really easy setup (thanks to apachesolr module)
    SOLR will soon give a boost to the performance and search abilities of your views
    SOLR will right now give you a lot of more power for searching from your custom databases and complicated content types, if used by a module developer
    It’s still not a magic bullet – it has it’s downsides
  • 36. Sounds
    easy?
    Been there,
    done that?
    is recruiting
    Send your CV to jobs@exove.com
  • 37. Thank you for your time
    Questions?
    If you’d rather ask me in private,
    drop a mail to kalle@exove.com