Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Drupalcamp Estonia - Media Sites and SOLR search

1,753
views

Published on

Kalle Varisvirta's presentation at DrupalCamp Estonia.

Kalle Varisvirta's presentation at DrupalCamp Estonia.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,753
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Searching media sites with SOLR Kalle Varisvirta, Exove
  • 2. Media sites on Drupal
    • Drupal works well with media sites
    • They’re all about UGC these days, this is what Drupal does best
    • Media has a lot of content, a powerful content management system is needed
    • Content needs to be tagged, categorized and organized in a way that it’s usable, Drupal’s taxonomy is great for this
  • 3. Media sites on Drupal
    • It’s not all fun and games
    • Media sites may have 100k – 10M nodes, Drupal doesn’t handle that too well
    • Media sites need elaborate caching schemes to manage the traffic, that need extra modules with Drupal
    • Drupal’s core search is not the optimal solution for searching the sites
  • 4. What ’ s wrong with Drupal ’ s core search?
    • MySQL based (not powerful, try faceting with it)
    • No proper language support
    • No proper search indexing tools in general (stemmer (!), stoplists, compound words, proper relevance)
  • 5. What ’ s SOLR
    • SOLR is a search engine based on the Lucene search library, fully free and open source
    • It ’ s written in Java (and needs a Java web platform to run)
    • It ’ s an Apache project
    • It ’ s very widely used (users include NASA, CNET, AT&T, Cisco, Disney, FCC, White House and many more)
  • 6. SOLR and Drupal
    • SOLR is not hard to integrate to, but since we ’ re in the Drupal community, “ there ’ s a module for that ”
    • Actually, the module is pretty great, built and supported by Aquia, called “ Apache SOLR Search Integration ” ( http://drupal.org/project/apachesolr)
    • Aquia also runs a commercial (= it costs you money) SOLR service online, that you can use for searching, if you can’t install your own SOLR for some reason
  • 7. SOLR and Drupal - installation
    • Install the apachesolr module
    • Install SOLR, just copy the schema.xml and stopwords.txt from the apachesolr-module to the SOLR configuration
    • Kickstart SOLR (it comes with a lightweight Java env, called Jetty)
    • Configure the apachesolr module to connect to the SOLR just installed
    • Reindex content
  • 8. SOLR-powered search
    • It ’ s fast, SOLR runs through 10M nodes in less than 60 msec (on a dedicated virtual server)
    • It ’ s accurate, due to stemming (with language support) and better relevance calculation
    • It has spellchecker and it can suggest you better search words
    • You can finetune the relevance by content biasing
  • 9. SOLR-powered search
    • In addition to being faster and more accurate, SOLR-powered search gives some extra functionalities
    • Faceted search means that the user can “ drill down ” to the results, this is the modern way of doing “ advanced search ”
    • You can facet using taxonomy terms and fields, although some fields need some programming to help SOLR divide the nodes to the groups (e.g. numbers or dates)
  • 10. SOLR-powered search - caveats
    • SOLR is very language driven, so you need to have stopwords and stemming in the language you are using
    • Some languages are supported better than others
    • You can always contribute to the community to make the language support better
    • SOLR also needs the Java environment to run, so not available on the cheapest web hotels
    • SOLR needs some understanding of the Java world to be finetuned to best performance, that might be hard for LAMP-oriented people
  • 11. Taking all the juice out of your SOLR
    • We’ve been using SOLR for some other stuff too
    • It would be great to be able to use it for Views backend, but the integration module “Apache Solr Views” seems to be unsupported and pretty much dead
    • You can still use it for custom stuff, it has all your nodes anyway, but that needs some programming
    • SOLR can do a lot more, like spatial (location based) search, grouping of the results, relevance tuning by column weight etc.
  • 12. Thank you for your time Questions, comments?