Quick Introduction to Sphinx and Thinking Sphinx
Upcoming SlideShare
Loading in...5
×
 

Quick Introduction to Sphinx and Thinking Sphinx

on

  • 7,118 views

A 10 minute introduction to the basics of using Sphinx and Thinking Sphinx for doing full-text search in Ruby on Rails Applications.

A 10 minute introduction to the basics of using Sphinx and Thinking Sphinx for doing full-text search in Ruby on Rails Applications.

Statistics

Views

Total Views
7,118
Views on SlideShare
6,041
Embed Views
1,077

Actions

Likes
5
Downloads
64
Comments
0

5 Embeds 1,077

http://hayesdavis.net 1052
http://www.slideshare.net 15
http://translate.googleusercontent.com 7
http://webcache.googleusercontent.com 2
http://www.health.medicbd.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Quick Introduction to Sphinx and Thinking Sphinx Quick Introduction to Sphinx and Thinking Sphinx Presentation Transcript

  • SPHINX AND THINKING SPHINX 10 Minute Intro
  • HAYES DAVIS Founder, Appozite cheaptweet.com | @cheaptweet @hayesdavis
  • SPHINX •Open Source full-text search engine •Designed around SQL •Standalone daemon (searchd) http://guardians.net/hawass/images/sphinx3.jpg
  • THINKING SPHINX •Rails plugin •Integrates Active Record with Sphinx •Makes talking to Sphinx basically painless
  • BASIC IDEA • Configure your indexes • Index • Query • Repeat
  • CONFIGURING INDEXES • Add indexes on your AR class Article < ActiveRecord::Base classes using define_index define_index do # fields • Fields (indexes) contain text indexes subject, :sortable => true indexes content you can search indexes author.name, :as=> :author, :sortable => true • Attributes (has) allow you to # attributes sort and constrain your has author_id, created_at, updated_at searches end end • Careful!Column names aren’t symbols
  • Run the indexer rake thinking_sphinx:index
  • source twitterer_core_0 { type = mysql sql_host = 127.0.0.1 sql_user = cheaptweet sql_pass = cheaptweet sql_db = cheaptweet_development2 sql_query_pre = UPDATE `twitterer` SET `delta` = 0 sql_query_pre = SET NAMES utf8 sql_query = SELECT `twitterer`.`id` * 1 + 0 AS `id` , CAST(`twitterer`.`screen_name` AS CHAR) AS `screen_name`, CAST(`twitterer`.`name` AS CHAR) AS `name`, CAST(`twitterer`.`description` AS CHAR) AS `description`, CAST(`twitterer`.`url` AS CHAR) AS `url`, CAST(`twitterer`.`location` AS CHAR) AS `location`, `twitterer`.`id` AS `sphinx_internal_id`, 283224142 AS `class_crc`, '283224142' AS `subclass_crcs`, 0 AS `sphinx_deleted` FROM twitterer WHERE `twitterer`.`id` >= $start AND `twitterer`.`id` <= $end AND `twitterer`.`delta` = 0 GROUP BY `twitterer`.`id` ORDER BY NULL sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM `twitterer` WHERE `twitterer`.`delta` = 0 sql_attr_uint = sphinx_internal_id sql_attr_uint = class_crc sql_attr_uint = sphinx_deleted sql_attr_multi = uint subclass_crcs from field sql_query_info = SELECT * FROM `twitterer` WHERE `id` = (($id - 0) / 1) } index twitterer_core { source = twitterer_core_0 path = /Users/hayesdavis/Appozite/workspace/CheapTweet/data/sphinx/development/twitterer_core morphology = stem_en charset_type = utf-8 } MORE ABOUT INDEXING Thinking Sphinx generates a config file for sphinx, indexes (aka “sources”) are defined. It’s a little complicated.
  • Start Sphinx rake thinking_sphinx:start
  • #Searches all fields for “pants” Article.search “pants” #Conditions are allowed on fields but must be hash Article.search “pants”, :conditions=>{ :subject=>”How To Wear” } #Query attributes using :with Article.search “pants”, :with=>{ :author_id=>1, :created_at=>1.week.ago..Time.now } SEARCHING Use the search method on AR classes
  • BUT WAIT HOW DO I KEEP INDEXES (ESPECIALLY BIG ONES) UP TO DATE?
  • DELTA INDEXES TO THE RESCUE • Mini index of only rows that have been updated • Must merge into “core” index periodically or it’ll get slow • Simplest approach: add delta boolean column to model • Add set_property :delta=>true to define_index block • Delta index is rebuilt on model saves, can cause performance hit
  • DEPLOYMENT & PRODUCTION • Must schedule full re-indexing periodically • Have god or monit keep an eye on things • Consider adding some cap tasks to help out with reindexing and restarting
  • TIPS, TRICKS, GOTCHAS • Simplest delta indexing can lead to performance issues • Indexer assumes you have sequential ids on your DB rows and iterates through them in chunks - very bad if you have big gaps • Run full indexing as often as you can without hurting performance - it’s usually pretty fast • Youcan hand-edit config files if you need to tune - but be careful not to regenerate
  • RESOURCES Sphinx http://www.sphinxsearch.com/ Thinking Sphinx http://freelancing-god.github.com/ts/en/ Railscast http://railscasts.com/episodes/120-thinking-sphinx