Your SlideShare is downloading. ×
Rails and the Apache SOLR Search Engine
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Rails and the Apache SOLR Search Engine


Published on

What good is content if nobody can find it? Many information sites are like icebergs, with only a limited amount of content directly accessible to users and the rest, the "underwater" potion, only …

What good is content if nobody can find it? Many information sites are like icebergs, with only a limited amount of content directly accessible to users and the rest, the "underwater" potion, only available through searches. This talk shows how Rails web sites can take advantage of the world-class Apache SOLR search engine to provide sophisticated and customizable search features. We'll cover how to get started with SOLR, integrating with SOLR using the Sunspot gem, indexing, hit highlighting and other topics.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Agenda1. Why?2. Technology Stack3. Our First App• Demo 4. Complications5. Conclusions
  • 2. Part 1:                    Why?
  • 3. Part 1:                    Why? “Why the Lucky Stiff” at a conference in 2006… - innovative, eccentric, suddenly retired from public life…
  • 4. Why Me?Well, I’ve been doing Search since 1998… • CheckPoint – Online tax information for CPA’s • Legislate – Everything you ever wanted to know  about legislation – more than a million docs • National Council of Teachers of Mathematics – Online journals • Grab Networks – Searching news video summaries • Pfizer – Drug documentation
  • 5. A Typical Information Site
  • 6. Why We Need Search• “A feature doesn’t exist if users can’t find it.” ‐ Jeff Atwood, co‐creator of Stack Overflow ‐ The same principle applies to content• Content costs money ‐ If people can’t find it, your money is wasted• The Long Tail ‐ More.content should be == More.traffic
  • 7. Part 2: Technology Stack
  • 8. Lucene is a Toolbox… SOLR is a Search Server…• Indexing • With API’s• Searching • Hit Highlighting• Spell‐checking • Faceted Searches• Hit Highlighting • Caching• Advanced tokenization • A Web Admin Interface
  • 9. SOLR and Lucene need Java 1.4 or higher
  • 10. Timeline Lucene becomes Lucene / SOLR top-level Apache Merger projectLucene started SOLR created by Lucene / SOLRon SourceForge Yonik Seeley at Apache SOLR leaves incubation 3.5 Releasedby Doug Cutter CNET Networks Lucene joins the SOLR donated to Apache Jakarta Apache Lucene product family by CNET 1997 2001 2004 2005 2006 2007 2010 2011 Sept Feb Nov
  • 11. Search Stack for Our Rails App Rails Web Site SOLR Sunspot Lucene rsolrDev/Test:   Jetty Dev/Test:   WEBrickProduction:  Tomcat Production:  Apache with Phusion Passenger
  • 12. Rsolr and Sunspot Sunspot provides Ruby‐style Rsolr is a SOLR client… API’s…• By Matt Mitchell • By Andy Lindeman, Nick • Originated Sept 2009 Zadrozny & Mat Brown• Reached 1.0 Jan 2011 • Originated Aug 2009• Now at Version 1.0.7 • Reached 1.0 Mar 2010• Low‐level client • Now at Version 1.3.1 • With Rails or just Ruby • Drop‐in ActiveRecord  support
  • 13. Part 3: Our First AppAnother First… The world’s first amphibious lamborghini
  • 14. A Simple Blog – With Search source• Generate a Rails App ‐ rails new demo1 gem rails, 3.0.11 gem sqlite3 # Default database• Configure Gemfile gem will_paginate gem sunspot_rails # Sunspot for Rails ‐ bundle install group :development do gem nifty-generators # Better scaffolding gem sunspot_solr # Pre-built SOLR end group :test do gem sunspot_solr # Pre-built SOLR gem "mocha" # Testing tool end Gemfile
  • 15. A Simple Blog (2)• Scaffolding and Database ‐ rails g nifty:layout ‐ rails g nifty:scaffold Post title:string body:text featured:boolean ‐ rake db:migrate ‐ Also, remove public/index.html and point root to posts#index• Populate the SQLite3 Database: ‐ sqlite3 development.sqlite3 > read data.sql      # Loads 100+ blog entries > .exit
  • 16. A Simple Blog (3) production:• SOLR Config File solr: hostname: localhost ‐ rails generate sunspot_rails:install port: 8983 ‐ Creates config/sunspot.yml log_level: WARNING• Make Post class searchable development: solr: class Post < ActiveRecord::Base hostname: localhost attr_accessible :title, :body, :featured port: 8982 self.per_page = 10 log_level: INFO searchable do test: text :title, :body solr: integer :id hostname: localhost boolean: featured port: 8981 end log_level: WARNING /config/sunspot.yml end /app/models/post.rb
  • 17. A Simple Blog (4) - solr• Start SOLR - conf - data ‐ rake sunspot:solr:start - development - test ‐ Creates SOLR directory tree - pids on first start - development - test• Index Your Data SOLR Directory ‐ rake sunspot:solr:reindex solr/data solr/pids .gitignore
  • 18. The Search UI• We’ve got a web site• SOLR is running• We’ve got indexed contentBut…• We need a Search Form• We need a Search Results page
  • 19. The Search Form<%= form_tag(searches_path, :id => search-form, :method => :get) do |f| %> <span> <%= text_field_tag :query, params[:query] %> <%= submit_tag Search, :id => commit %> </span><% end %> /app/views/layouts/_search.html.erb• Just a simple view partial…• That’s rendered by the site’s layout
  • 20. Search Resultsresources :searches, :only => [:index] config/routes.rb<% if @posts.present? %> <%= will_paginate @posts, :container => false %> <table> <% for post in @posts %> <tr><td"><%= raw(post.title) %></td></tr> <tr><td><%= post.body.truncate(300) %></td></tr> <% end %> </table> <%= page_entries_info @posts %> <%= will_paginate @posts, :container => false %><% else %> <p>No search results are available. Please try another search.</p><% end %> app/views/searches/index.html.erb
  • 21. Search Controllerclass SearchesController < ApplicationController def index ActiveRecord Search if params[:query].present? Integration search = { fulltext params[:query] paginate :page => params[:page].present? ? params[:page] : 1, :per_page => 10 } Pagination integrates @posts = search.results w/ will_paginate gem else @posts = nil end endend Security: Params[:query] must be scrubbed if it will app/controllers/searches_controller.rb be re-displayed…
  • 22. What About New Posts?How do new posts get indexed?For any model with searchable fields… ‐ Sunspot integrates with ActiveRecord ‐ Indexing happens automatically on save
  • 23. In 2006, DHH made a big splash withhis “15‐minute blog with Rails”
  • 24. This is the…20‐Minute Rails Blogwith Search
  • 25. Tip: A Clean Start• In development, sometimes you mess up…• You can hose up the SOLR indicesSolution:• Don’t be afraid to blow away the solr dir tree• “rake sunspot:solr:start” rebuilds the tree• Just don’t lose any solr/conf changes
  • 26. Search WeightingTitles seem more important…can we weight thetitle higher than the body? search = { fulltext params[:query] paginate :page => params[:page].present? ? params[:page] : 1, :per_page => 10 } BEFORE search = { fulltext params[:query] do boost_fields :title => 2.0 end paginate :page => params[:page].present? ? params[:page] : 1, :per_page => 10 } AFTER
  • 27. Is There a Better Way?The “boost” can be done at index time…class Post < ActiveRecord::Base searchable do text :title, :boost => 2.0 text :body integer :id boolean: featured endend /app/models/post.rb
  • 28. What About Related Data? text :comments do { |comment| comment.body } end /app/models/post.rb• Could have used “acts_as_commentable” gem• Your “document” is virtual• You define it• You can reference attributes, methods, etc.
  • 29. Filtering search = { fulltext params[:query] do boost_fields :title => 2.0 end with(:featured, true) } /app/controllers/searches_controller.rb• Text fields are searched• The boolean “featured” attribute is filtered• Search returns only featured posts that match  the full‐text search criteria
  • 30. Hit Highlightingclass Post < ActiveRecord::Base @search = { searchable do fulltext params[:query] do ... highlight :body text :body, :stored => true end end } /app/controllers/searches_controller.rbend /app/models/post.rb@search.hits.each do |hit| Post #1 puts "Post ##{hit.primary_key}" I use *git* on my project hit.highlights(:body).each do |highlight| Post #2 puts " " + highlight.format { |word| "*#{word}*" } the *git* utility is cool end OUTPUTend /app/views/searches/index.html.erb
  • 31. AuthorizationJust because content is indexed doesn’t meana particular user is allowed to see it…• Search‐enforced access control ‐ Access control data stored in index ‐ Can be used to filter results  (No code, but something to think about…)
  • 32. Part 5: Conclusions
  • 33. Is Highly CustomizableIs Easily Added to Any ProjectHelps You Leverage Your Content
  • 34. Is Designed to be an Enterprise  Component Web ServerDatabase Server Search Server Memcache Server A TYPICAL ARCHITECTURE
  • 35. Is Well Supported
  • 36. Questions? Get the Code: Get the Slides:’m David Keener and you can find me at: • Blog: • Facebook: • Twitter: dkeener2010 • Email: david.keener@gd‐
  • 37. David KeenerFrom the Washington DC area.I’m a software architect for General Dynamics Advanced Information Systems, a large defense contractor in the DC area now using Ruby, Rails and other open source technologies on  various projects.A founder of the RubyNation and DevIgnition Conferences.Frequent speaker at conferences and user groups, including SunnyConf, Scotland on Rails, RubyNation, DevIgnition, etc.
  • 38. Credits – Journal thumbnails‐wallpaper/‐resources‐february‐2010/‐content/uploads/2009/03/amphibious‐lamborghini‐mod.jpgDavid Keener speaking at AOL; photo by Edmund Joe known copyright; received via spamUbiquitous iceberg picture on the Internet; no known copyright‐of‐stones.html‐drop‐splash‐art/‐target‐dart‐psd‐icons/ All logos are trademarks of their associated corporations