• Save
Drupalcamp Estonia - Media Sites and SOLR search
 

Drupalcamp Estonia - Media Sites and SOLR search

on

  • 2,116 views

Kalle Varisvirta's presentation at DrupalCamp Estonia.

Kalle Varisvirta's presentation at DrupalCamp Estonia.

Statistics

Views

Total Views
2,116
Views on SlideShare
1,567
Embed Views
549

Actions

Likes
0
Downloads
0
Comments
0

8 Embeds 549

http://www.exove.fi 365
http://www.exove.com 127
http://silver.exove.net 28
http://exove.com 22
http://exove2012.local 2
http://www.exove.co.uk 2
http://www.exove.ee 2
http://webcache.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Drupalcamp Estonia - Media Sites and SOLR search Drupalcamp Estonia - Media Sites and SOLR search Presentation Transcript

  • Searching media sites with SOLR Kalle Varisvirta, Exove
  • Media sites on Drupal
    • Drupal works well with media sites
    • They’re all about UGC these days, this is what Drupal does best
    • Media has a lot of content, a powerful content management system is needed
    • Content needs to be tagged, categorized and organized in a way that it’s usable, Drupal’s taxonomy is great for this
  • Media sites on Drupal
    • It’s not all fun and games
    • Media sites may have 100k – 10M nodes, Drupal doesn’t handle that too well
    • Media sites need elaborate caching schemes to manage the traffic, that need extra modules with Drupal
    • Drupal’s core search is not the optimal solution for searching the sites
  • What ’ s wrong with Drupal ’ s core search?
    • MySQL based (not powerful, try faceting with it)
    • No proper language support
    • No proper search indexing tools in general (stemmer (!), stoplists, compound words, proper relevance)
  • What ’ s SOLR
    • SOLR is a search engine based on the Lucene search library, fully free and open source
    • It ’ s written in Java (and needs a Java web platform to run)
    • It ’ s an Apache project
    • It ’ s very widely used (users include NASA, CNET, AT&T, Cisco, Disney, FCC, White House and many more)
  • SOLR and Drupal
    • SOLR is not hard to integrate to, but since we ’ re in the Drupal community, “ there ’ s a module for that ”
    • Actually, the module is pretty great, built and supported by Aquia, called “ Apache SOLR Search Integration ” ( http://drupal.org/project/apachesolr)
    • Aquia also runs a commercial (= it costs you money) SOLR service online, that you can use for searching, if you can’t install your own SOLR for some reason
  • SOLR and Drupal - installation
    • Install the apachesolr module
    • Install SOLR, just copy the schema.xml and stopwords.txt from the apachesolr-module to the SOLR configuration
    • Kickstart SOLR (it comes with a lightweight Java env, called Jetty)
    • Configure the apachesolr module to connect to the SOLR just installed
    • Reindex content
  • SOLR-powered search
    • It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server)
    • It ’ s accurate, due to stemming (with language support) and better relevance calculation
    • It has spellchecker and it can suggest you better search words
    • You can finetune the relevance by content biasing
  • SOLR-powered search
    • In addition to being faster and more accurate, SOLR-powered search gives some extra functionalities
    • Faceted search means that the user can “ drill down ” to the results, this is the modern way of doing “ advanced search ”
    • You can facet using taxonomy terms and fields, although some fields need some programming to help SOLR divide the nodes to the groups (e.g. numbers or dates)
  • SOLR-powered search - caveats
    • SOLR is very language driven, so you need to have stopwords and stemming in the language you are using
    • Some languages are supported better than others
    • You can always contribute to the community to make the language support better
    • SOLR also needs the Java environment to run, so not available on the cheapest web hotels
    • SOLR needs some understanding of the Java world to be finetuned to best performance, that might be hard for LAMP-oriented people
  • Taking all the juice out of your SOLR
    • We’ve been using SOLR for some other stuff too
    • It would be great to be able to use it for Views backend, but the integration module “Apache Solr Views” seems to be unsupported and pretty much dead
    • You can still use it for custom stuff, it has all your nodes anyway, but that needs some programming
    • SOLR can do a lot more, like spatial (location based) search, grouping of the results, relevance tuning by column weight etc.
  • Thank you for your time Questions, comments?