Searching media sites with SOLR Kalle Varisvirta, Exove
Media sites on Drupal <ul><li>Drupal works well with media sites </li></ul><ul><li>They’re all about UGC these days, this ...
Media sites on Drupal <ul><li>It’s not all fun and games </li></ul><ul><li>Media sites may have 100k – 10M nodes, Drupal d...
What ’ s wrong with Drupal ’ s core search? <ul><li>MySQL based (not powerful, try faceting with it) </li></ul><ul><li>No ...
What ’ s SOLR <ul><li>SOLR is a search engine based on the Lucene search library, fully free and open source </li></ul><ul...
SOLR and Drupal <ul><li>SOLR is not hard to integrate to, but since we ’ re in the Drupal community,  “ there ’ s a module...
SOLR and Drupal - installation <ul><li>Install the apachesolr module </li></ul><ul><li>Install SOLR, just copy the schema....
SOLR-powered search <ul><li>It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server) ...
SOLR-powered search <ul><li>In addition to being faster and more accurate, SOLR-powered search gives some extra functional...
SOLR-powered search - caveats <ul><li>SOLR is very language driven, so you need to have stopwords and stemming in the lang...
Taking all the juice out of your SOLR <ul><li>We’ve been using SOLR for some other stuff too </li></ul><ul><li>It would be...
Thank you for your time Questions, comments?
Upcoming SlideShare
Loading in …5
×

Drupalcamp Estonia - Media Sites and SOLR search

1,827
-1

Published on

Kalle Varisvirta's presentation at DrupalCamp Estonia.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,827
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Drupalcamp Estonia - Media Sites and SOLR search

  1. 1. Searching media sites with SOLR Kalle Varisvirta, Exove
  2. 2. Media sites on Drupal <ul><li>Drupal works well with media sites </li></ul><ul><li>They’re all about UGC these days, this is what Drupal does best </li></ul><ul><li>Media has a lot of content, a powerful content management system is needed </li></ul><ul><li>Content needs to be tagged, categorized and organized in a way that it’s usable, Drupal’s taxonomy is great for this </li></ul>
  3. 3. Media sites on Drupal <ul><li>It’s not all fun and games </li></ul><ul><li>Media sites may have 100k – 10M nodes, Drupal doesn’t handle that too well </li></ul><ul><li>Media sites need elaborate caching schemes to manage the traffic, that need extra modules with Drupal </li></ul><ul><li>Drupal’s core search is not the optimal solution for searching the sites </li></ul>
  4. 4. What ’ s wrong with Drupal ’ s core search? <ul><li>MySQL based (not powerful, try faceting with it) </li></ul><ul><li>No proper language support </li></ul><ul><li>No proper search indexing tools in general (stemmer (!), stoplists, compound words, proper relevance) </li></ul>
  5. 5. What ’ s SOLR <ul><li>SOLR is a search engine based on the Lucene search library, fully free and open source </li></ul><ul><li>It ’ s written in Java (and needs a Java web platform to run) </li></ul><ul><li>It ’ s an Apache project </li></ul><ul><li>It ’ s very widely used (users include NASA, CNET, AT&T, Cisco, Disney, FCC, White House and many more) </li></ul>
  6. 6. SOLR and Drupal <ul><li>SOLR is not hard to integrate to, but since we ’ re in the Drupal community, “ there ’ s a module for that ” </li></ul><ul><li>Actually, the module is pretty great, built and supported by Aquia, called “ Apache SOLR Search Integration ” ( http://drupal.org/project/apachesolr) </li></ul><ul><li>Aquia also runs a commercial (= it costs you money) SOLR service online, that you can use for searching, if you can’t install your own SOLR for some reason </li></ul>
  7. 7. SOLR and Drupal - installation <ul><li>Install the apachesolr module </li></ul><ul><li>Install SOLR, just copy the schema.xml and stopwords.txt from the apachesolr-module to the SOLR configuration </li></ul><ul><li>Kickstart SOLR (it comes with a lightweight Java env, called Jetty) </li></ul><ul><li>Configure the apachesolr module to connect to the SOLR just installed </li></ul><ul><li>Reindex content </li></ul>
  8. 8. SOLR-powered search <ul><li>It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server) </li></ul><ul><li>It ’ s accurate, due to stemming (with language support) and better relevance calculation </li></ul><ul><li>It has spellchecker and it can suggest you better search words </li></ul><ul><li>You can finetune the relevance by content biasing </li></ul>
  9. 9. SOLR-powered search <ul><li>In addition to being faster and more accurate, SOLR-powered search gives some extra functionalities </li></ul><ul><li>Faceted search means that the user can “ drill down ” to the results, this is the modern way of doing “ advanced search ” </li></ul><ul><li>You can facet using taxonomy terms and fields, although some fields need some programming to help SOLR divide the nodes to the groups (e.g. numbers or dates) </li></ul>
  10. 10. SOLR-powered search - caveats <ul><li>SOLR is very language driven, so you need to have stopwords and stemming in the language you are using </li></ul><ul><li>Some languages are supported better than others </li></ul><ul><li>You can always contribute to the community to make the language support better </li></ul><ul><li>SOLR also needs the Java environment to run, so not available on the cheapest web hotels </li></ul><ul><li>SOLR needs some understanding of the Java world to be finetuned to best performance, that might be hard for LAMP-oriented people </li></ul>
  11. 11. Taking all the juice out of your SOLR <ul><li>We’ve been using SOLR for some other stuff too </li></ul><ul><li>It would be great to be able to use it for Views backend, but the integration module “Apache Solr Views” seems to be unsupported and pretty much dead </li></ul><ul><li>You can still use it for custom stuff, it has all your nodes anyway, but that needs some programming </li></ul><ul><li>SOLR can do a lot more, like spatial (location based) search, grouping of the results, relevance tuning by column weight etc. </li></ul>
  12. 12. Thank you for your time Questions, comments?

×