Quick Introduction to Sphinx and Thinking Sphinx

5,879 views

Published on

A 10 minute introduction to the basics of using Sphinx and Thinking Sphinx for doing full-text search in Ruby on Rails Applications.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,879
On SlideShare
0
From Embeds
0
Number of Embeds
1,188
Actions
Shares
0
Downloads
80
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Quick Introduction to Sphinx and Thinking Sphinx

  1. 1. SPHINX AND THINKING SPHINX 10 Minute Intro
  2. 2. HAYES DAVIS Founder, Appozite cheaptweet.com | @cheaptweet @hayesdavis
  3. 3. SPHINX •Open Source full-text search engine •Designed around SQL •Standalone daemon (searchd) http://guardians.net/hawass/images/sphinx3.jpg
  4. 4. THINKING SPHINX •Rails plugin •Integrates Active Record with Sphinx •Makes talking to Sphinx basically painless
  5. 5. BASIC IDEA • Configure your indexes • Index • Query • Repeat
  6. 6. CONFIGURING INDEXES • Add indexes on your AR class Article < ActiveRecord::Base classes using define_index define_index do # fields • Fields (indexes) contain text indexes subject, :sortable => true indexes content you can search indexes author.name, :as=> :author, :sortable => true • Attributes (has) allow you to # attributes sort and constrain your has author_id, created_at, updated_at searches end end • Careful!Column names aren’t symbols
  7. 7. Run the indexer rake thinking_sphinx:index
  8. 8. source twitterer_core_0 { type = mysql sql_host = 127.0.0.1 sql_user = cheaptweet sql_pass = cheaptweet sql_db = cheaptweet_development2 sql_query_pre = UPDATE `twitterer` SET `delta` = 0 sql_query_pre = SET NAMES utf8 sql_query = SELECT `twitterer`.`id` * 1 + 0 AS `id` , CAST(`twitterer`.`screen_name` AS CHAR) AS `screen_name`, CAST(`twitterer`.`name` AS CHAR) AS `name`, CAST(`twitterer`.`description` AS CHAR) AS `description`, CAST(`twitterer`.`url` AS CHAR) AS `url`, CAST(`twitterer`.`location` AS CHAR) AS `location`, `twitterer`.`id` AS `sphinx_internal_id`, 283224142 AS `class_crc`, '283224142' AS `subclass_crcs`, 0 AS `sphinx_deleted` FROM twitterer WHERE `twitterer`.`id` >= $start AND `twitterer`.`id` <= $end AND `twitterer`.`delta` = 0 GROUP BY `twitterer`.`id` ORDER BY NULL sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM `twitterer` WHERE `twitterer`.`delta` = 0 sql_attr_uint = sphinx_internal_id sql_attr_uint = class_crc sql_attr_uint = sphinx_deleted sql_attr_multi = uint subclass_crcs from field sql_query_info = SELECT * FROM `twitterer` WHERE `id` = (($id - 0) / 1) } index twitterer_core { source = twitterer_core_0 path = /Users/hayesdavis/Appozite/workspace/CheapTweet/data/sphinx/development/twitterer_core morphology = stem_en charset_type = utf-8 } MORE ABOUT INDEXING Thinking Sphinx generates a config file for sphinx, indexes (aka “sources”) are defined. It’s a little complicated.
  9. 9. Start Sphinx rake thinking_sphinx:start
  10. 10. #Searches all fields for “pants” Article.search “pants” #Conditions are allowed on fields but must be hash Article.search “pants”, :conditions=>{ :subject=>”How To Wear” } #Query attributes using :with Article.search “pants”, :with=>{ :author_id=>1, :created_at=>1.week.ago..Time.now } SEARCHING Use the search method on AR classes
  11. 11. BUT WAIT HOW DO I KEEP INDEXES (ESPECIALLY BIG ONES) UP TO DATE?
  12. 12. DELTA INDEXES TO THE RESCUE • Mini index of only rows that have been updated • Must merge into “core” index periodically or it’ll get slow • Simplest approach: add delta boolean column to model • Add set_property :delta=>true to define_index block • Delta index is rebuilt on model saves, can cause performance hit
  13. 13. DEPLOYMENT & PRODUCTION • Must schedule full re-indexing periodically • Have god or monit keep an eye on things • Consider adding some cap tasks to help out with reindexing and restarting
  14. 14. TIPS, TRICKS, GOTCHAS • Simplest delta indexing can lead to performance issues • Indexer assumes you have sequential ids on your DB rows and iterates through them in chunks - very bad if you have big gaps • Run full indexing as often as you can without hurting performance - it’s usually pretty fast • Youcan hand-edit config files if you need to tune - but be careful not to regenerate
  15. 15. RESOURCES Sphinx http://www.sphinxsearch.com/ Thinking Sphinx http://freelancing-god.github.com/ts/en/ Railscast http://railscasts.com/episodes/120-thinking-sphinx

×