• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Drupalcamp Estonia - Media Sites and SOLR search

Drupalcamp Estonia - Media Sites and SOLR search



Kalle Varisvirta's presentation at DrupalCamp Estonia.

Kalle Varisvirta's presentation at DrupalCamp Estonia.



Total Views
Views on SlideShare
Embed Views



8 Embeds 547

http://www.exove.fi 365
http://www.exove.com 125
http://silver.exove.net 28
http://exove.com 22
http://exove2012.local 2
http://www.exove.co.uk 2
http://www.exove.ee 2
http://webcache.googleusercontent.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Drupalcamp Estonia - Media Sites and SOLR search Drupalcamp Estonia - Media Sites and SOLR search Presentation Transcript

    • Searching media sites with SOLR Kalle Varisvirta, Exove
    • Media sites on Drupal
      • Drupal works well with media sites
      • They’re all about UGC these days, this is what Drupal does best
      • Media has a lot of content, a powerful content management system is needed
      • Content needs to be tagged, categorized and organized in a way that it’s usable, Drupal’s taxonomy is great for this
    • Media sites on Drupal
      • It’s not all fun and games
      • Media sites may have 100k – 10M nodes, Drupal doesn’t handle that too well
      • Media sites need elaborate caching schemes to manage the traffic, that need extra modules with Drupal
      • Drupal’s core search is not the optimal solution for searching the sites
    • What ’ s wrong with Drupal ’ s core search?
      • MySQL based (not powerful, try faceting with it)
      • No proper language support
      • No proper search indexing tools in general (stemmer (!), stoplists, compound words, proper relevance)
    • What ’ s SOLR
      • SOLR is a search engine based on the Lucene search library, fully free and open source
      • It ’ s written in Java (and needs a Java web platform to run)
      • It ’ s an Apache project
      • It ’ s very widely used (users include NASA, CNET, AT&T, Cisco, Disney, FCC, White House and many more)
    • SOLR and Drupal
      • SOLR is not hard to integrate to, but since we ’ re in the Drupal community, “ there ’ s a module for that ”
      • Actually, the module is pretty great, built and supported by Aquia, called “ Apache SOLR Search Integration ” ( http://drupal.org/project/apachesolr)
      • Aquia also runs a commercial (= it costs you money) SOLR service online, that you can use for searching, if you can’t install your own SOLR for some reason
    • SOLR and Drupal - installation
      • Install the apachesolr module
      • Install SOLR, just copy the schema.xml and stopwords.txt from the apachesolr-module to the SOLR configuration
      • Kickstart SOLR (it comes with a lightweight Java env, called Jetty)
      • Configure the apachesolr module to connect to the SOLR just installed
      • Reindex content
    • SOLR-powered search
      • It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server)
      • It ’ s accurate, due to stemming (with language support) and better relevance calculation
      • It has spellchecker and it can suggest you better search words
      • You can finetune the relevance by content biasing
    • SOLR-powered search
      • In addition to being faster and more accurate, SOLR-powered search gives some extra functionalities
      • Faceted search means that the user can “ drill down ” to the results, this is the modern way of doing “ advanced search ”
      • You can facet using taxonomy terms and fields, although some fields need some programming to help SOLR divide the nodes to the groups (e.g. numbers or dates)
    • SOLR-powered search - caveats
      • SOLR is very language driven, so you need to have stopwords and stemming in the language you are using
      • Some languages are supported better than others
      • You can always contribute to the community to make the language support better
      • SOLR also needs the Java environment to run, so not available on the cheapest web hotels
      • SOLR needs some understanding of the Java world to be finetuned to best performance, that might be hard for LAMP-oriented people
    • Taking all the juice out of your SOLR
      • We’ve been using SOLR for some other stuff too
      • It would be great to be able to use it for Views backend, but the integration module “Apache Solr Views” seems to be unsupported and pretty much dead
      • You can still use it for custom stuff, it has all your nodes anyway, but that needs some programming
      • SOLR can do a lot more, like spatial (location based) search, grouping of the results, relevance tuning by column weight etc.
    • Thank you for your time Questions, comments?