Solving the Riddle of Search: Using Sphinx with Rails

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    8 Favorites

    Solving the Riddle of Search: Using Sphinx with Rails - Presentation Transcript

    1. Solving the Riddle of Search Using Sphinx with Rails
    2. My name is Pat
    3. http://freelancing-gods.com http://twitter.com/pat
    4. Questions?
    5. Within Current Context?
    6. Ask as we go
    7. Jumping topics?
    8. Please Wait
    9. A Tute in Two Parts
    10. Part One: Core Concepts
    11. (ie: the boring bits)
    12. Part Two: Beyond the Basics
    13. (ie: the fun bits)
    14. So...
    15. Part One: Core Concepts
    16. What is Sphinx?
    17. Open Source Search Service
    18. http://www.sphinxsearch.com/
    19. Similar to Ferret, Solr, Xapian, etc
    20. Talks to MySQL and PostgreSQL
    21. ... and Microsoft SQL Server and Oracle in 0.9.9
    22. Two Key Tools
    23. indexer
    24. Guess what it does
    25. Talks to your database
    26. Files away your data
    27. searchd
    28. No prizes for guessing this either.
    29. searchd = search daemon
    30. Runs in the Background
    31. Handles Search Requests
    32. Does not talk to your database.
    33. What gets indexed?
    34. Documents
    35. No, not Microsoft Word docs
    36. A document is a single database record
    37. (ie: a Rails model instance)
    38. Documents have fields and attributes
    39. They are not the same.
    40. THEY ARE NOT THE SAME.
    41. Fields are...
    42. Textual Data
    43. Strings
    44. Words to search for.
    45. Attributes, on the other hand...
    46. Integers
    47. Floats
    48. Timestamps
    49. Booleans
    50. ... and Strings. Kinda.
    51. Strings as Ordinals
    52. Huh?
    53. Integer values indicating alphabetical order of strings.
    54. Useless for Searching?
    55. Useful for Sorting though.
    56. Search queries don’t look at attribute values.
    57. Sorting
    58. Grouping
    59. Filtering
    60. Exact value matching
    61. So, that’s Sphinx...
    62. But what’s Thinking Sphinx?
    63. Ruby Library/ Rails Plugin
    64. Hooks into ActiveRecord
    65. Configure Sphinx with Ruby and YAML
    66. Search with Sphinx in Rails
    67. What’s in it for me though?
    68. Why should I use Sphinx?
    69. SQL does it all, right?
    70. Well, maybe
    71. But it gets messy
    72. • Find all people • Where any word matches • Part of any of four fields
    73. # given a query of ‘Pat Allan’ # assuming MySQL SELECT * FROM people WHERE first_name LIKE '%Pat%' OR first_name LIKE '%Allan%' OR last_name LIKE '%Pat%' OR last_name LIKE '%Allan%' OR location LIKE '%Pat%' OR location LIKE '%Allan%' OR profile LIKE '%Pat%' OR profile LIKE '%Allan%';
    74. It’s a bit long...
    75. And that’s a simple query!
    76. If we use Thinking Sphinx
    77. # given a query of ‘Pat Allan’ Person.search \"Pat Allan\", :match_mode => :any
    78. Much cleaner
    79. Can we start coding now?
    80. Not quite.
    81. Indexing
    82. Fields
    83. What textual data do we want indexed?
    84. Enter the indexes method
    85. class Person < ActiveRecord::Base # ... define_index do indexes first_name, last_name, location, profile end # ... end
    86. # Combine columns into one field indexes [first_name, last_name], :as => :name # Avoid core Ruby class methods indexes :name # Use association columns indexes photos.captions, :as => :photos
    87. Attributes
    88. What do we want to sort and filter by?
    89. Our friend the has method
    90. class Person < ActiveRecord::Base # ... define_index do # ... has created_at, admin, email, :id end # ... end
    91. # Multi-Value Attributes (MVAs) has tags.id, :as => :tag_ids
    92. What if I want to sort by a field?
    93. indexes first_name, last_name, :sortable => true
    94. What does all this code do?
    95. Defines a SQL query for Sphinx
    96. Complexity leads to slower indexing
    97. So don’t forget to add database indexes as well!
    98. Warning for fans of fixtures...
    99. # In your define_index block: set_property( :sql_range_step => 10_000_000 )
    100. Because it’s SQL...
    101. Can only use model columns, not model methods
    102. Maybe that’s not good enough?
    103. Write your own SQL!
    104. has \"YEAR(created_at)\", :as => :year, :type => :integer
    105. So, we’ve set up our index. Now what?
    106. rake thinking_sphinx:index
    107. (in /Users/pat/Code/ruby/ts_test_2.2.0) Generating Configuration to /Users/pat/Code/ruby/ts_test_2.2.0/ config/development.sphinx.conf indexer --config /Users/pat/Code/ruby/ts_test_2.2.0/config/ development.sphinx.conf --all Sphinx 0.9.8-release (r1371) Copyright (c) 2001-2008, Andrew Aksyonoff using config file '/Users/pat/Code/ruby/ts_test_2.2.0/config/ development.sphinx.conf'... indexing index 'person_core'... collected 1000 docs, 0.0 MB collected 0 attr values sorted 0.0 Mvalues, 100.0% done sorted 0.0 Mhits, 100.0% done total 1000 docs, 14436 bytes total 0.191 sec, 75523.42 bytes/sec, 5231.60 docs/sec indexing index 'person_delta'... collected 0 docs, 0.0 MB collected 0 attr values sorted 0.0 Mvalues, nan% done total 0 docs, 0 bytes total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec distributed index 'person' can not be directly indexed; skipping.
    108. rake thinking_sphinx:start
    109. (in /Users/pat/Code/ruby/ts_test_2.2.0) searchd --pidfile --config /Users/pat/Code/ruby/ts_test_2.2.0/ config/development.sphinx.conf Sphinx 0.9.8-release (r1371) Copyright (c) 2001-2008, Andrew Aksyonoff using config file '/Users/pat/Code/ruby/ts_test_2.2.0/config/ development.sphinx.conf'... Started successfully (pid 6220).
    110. rake thinking_sphinx:stop
    111. (in /Users/pat/Code/ruby/ts_test_2.2.0) Sphinx 0.9.8-release (r1371) Copyright (c) 2001-2008, Andrew Aksyonoff using config file '/Users/pat/Code/ruby/ts_test_2.2.0/config/ development.sphinx.conf'... stop: succesfully sent SIGTERM to pid 6220 Stopped search daemon (pid 6220).
    112. rake ts:in rake ts:start rake ts:stop
    113. Can we search yet?
    114. Person.search
    115. Person.search \"Pat Allan\"
    116. Focus that search on a specific field
    117. Person.search(\"Ruby\", :conditions => {:name => \"Pat Allan\"} )
    118. Conditions are not exactly like ActiveRecord
    119. Strings only.
    120. And conditions should always be a Hash.
    121. Filtering on Attributes
    122. Person.search(\"Ruby\", :with => {:admin => true} )
    123. Filtering by Ranges
    124. Person.search(\"Ruby\", :with => { :created_at => 1.week.ago..Time.now } )
    125. Filtering by Arrays
    126. Person.search(\"Ruby\", :with => {:tag_ids => [1, 2, 3]} )
    127. Matching all Array values
    128. Person.search(\"Ruby\", :with_all => {:tag_ids => [1, 2, 3]} )
    129. Excluding Filter Values
    130. Person.search(\"Ruby\", :without => {:admin => true} )
    131. Match Modes
    132. :all
    133. (The Default)
    134. All words must exist in a document
    135. :any
    136. Person.search(\"Pat Allan\", :match_mode => :any )
    137. One of the words must exist in a document
    138. :phrase
    139. The exact query must exist in the document
    140. :boolean
    141. Use boolean logic in search queries
    142. :extended
    143. Can match on specific fields, and apply boolean logic
    144. Weighting
    145. Person.search(\"Pat Allan\", :field_weights => { \"name\" => 100, \"profile\" => 50, \"location\" => 30 } )
    146. define_index do # ... set_property :field_weights => { \"name\" => 100, \"profile\" => 50, \"location\" => 30 } end
    147. Sorting
    148. :relevance
    149. The Default
    150. By Attributes
    151. Or Fields flagged as :sortable
    152. :attr_asc
    153. Person.search \"Pat Allan\", :order => :created_at
    154. :attr_desc
    155. Person.search \"Pat Allan\", :order => :created_at, :sort_mode => :desc
    156. :extended
    157. Person.search \"Pat Allan\", :order => \"last_name ASC, first_name ASC\"
    158. :expr
    159. Person.search \"Pat Allan\", :sort_mode => :expr, :sort_by => \"@weight * ranking\"
    160. :time_segments
    161. Person.search \"Pat Allan\", :sort_mode => :time_segments, :sort_by => \"created_at\"
    162. • Last Hour • Last Day • Last Week • Last Month • Last 3 Months • Everything Else
    163. Sorted by Segment, then Relevance
    164. Multi-Model Searches
    165. ThinkingSphinx::Search.search( \"Pat Allan\" )
    166. Pagination
    167. Always On
    168. Person.search \"Pat Allan\", :page => params[:page]
    169. You can request really big pages though.
    170. Person.search \"Pat Allan\", :per_page => 1000, :page => params[:page]
    171. If you want massive pages...
    172. Person.search \"Pat Allan\", :page => params[:page], :per_page => 1_000_000, :max_matches => 1_000_000
    173. # config/sphinx.yml development: max_matches: 1000000 test: max_matches: 1000000 production: max_matches: 1000000
    174. Don’t forget to restart Sphinx
    175. Installing Thinking Sphinx
    176. Using Git?
    177. script/plugin install git://github.com/freelancing-god/ thinking-sphinx.git
    178. git clone git://github.com/ freelancing-god/thinking-sphinx.git
    179. Want the gem instead?
    180. sudo gem install freelancing-god-thinking-sphinx --source http://gems.github.com
    181. # inside config/environment.rb config.gem( \"freelancing-god-thinking-sphinx\", :version => \"1.1.6\", :lib => \"thinking_sphinx\" )
    182. # at the end of your Rakefile: require 'thinking_sphinx/tasks'
    183. Installing Sphinx
    184. Windows?
    185. Installer. Piece of Cake.
    186. *nix?
    187. apt, yum, etc. Should be painless.
    188. OS X?
    189. Well... it can be a little complicated
    190. ./configure make sudo make install
    191. With PostgreSQL?
    192. ./configure --with-pgsql=/usr/local/ include/postgresql make sudo make install
    193. pg_config --pkgincludedir
    194. Or Use MacPorts
    195. ... if you’re using it for MySQL or PostgreSQL as well
    196. Anyone still stuck?
    197. Examples in the Sample App
    198. git://github.com/freelancing- god/sphinx-tute.git
    199. Set up Database
    200. Hack as we go
    201. Questions?
    202. Afternoon Break?
    203. Part Two: Beyond the Basics
    204. Delta Indexes
    205. Why?
    206. No single- document updates
    207. Can only process a full index.
    208. Delta Index holds the changes
    209. rake thinking_sphinx:index
    210. What does this mean?
    211. We can track changes as they happen
    212. ... at the cost of a little extra overhead.
    213. How?
    214. 1. Add a boolean column ‘delta’ to your model.
    215. 2. Tell your index to use a delta.
    216. define_index do # ... set_property :delta => true end
    217. 3. Stop, Re-index, Restart.
    218. rake thinking_sphinx:stop rake thinking_sphinx:index rake thinking_sphinx:start
    219. ... that’s the default approach
    220. Datetime Deltas
    221. define_index do # ... set_property :delta => :datetime, :threshold => 1.day end
    222. Use existing updated_at column
    223. rake thinking_sphinx:index:delta
    224. Please Note: Not entirely reliable
    225. Delayed Deltas
    226. create_table :delayed_jobs do |t| # ... end
    227. define_index do # ... set_property :delta => :delayed end
    228. rake thinking_sphinx:delayed_delta
    229. Uses delayed_job
    230. Removes overheard from your web stack
    231. Give it a Shot
    232. Facets
    233. Search Summaries
    234. define_index do # ... indexes location, country, :facet => true has admin, :facet => true end
    235. Person.facets(\"Ruby\")
    236. { :country => { \"Australia\" => 15, \"USA\" => 11, \"Germany\" => 8, \"Canada\" => 12 }, :location => { \"Melbourne\" => 3, # ... }
    237. @facets.for(:country => \"Australia\")
    238. Give it a Shot
    239. Extended Configuration
    240. config/sphinx.yml
    241. Just like database.yml
    242. Global Sphinx settings
    243. Common Example: Partial Matching
    244. Rai* == Rails
    245. development: enable_star: 1 min_infix_len: 1 test: enable_star: 1 min_infix_len: 1 production: enable_star: 1 min_infix_len: 1
    246. Stop, Re-index, Restart.
    247. Deployment
    248. Using Capistrano?
    249. # config/deploy.rb load 'vendor/plugins/thinking-sphinx/lib/ thinking_sphinx/deploy/capistrano'
    250. cap thinking_sphinx:configure cap thinking_sphinx:index cap thinking_sphinx:install:sphinx cap thinking_sphinx:install:ts cap thinking_sphinx:rebuild cap thinking_sphinx:restart cap thinking_sphinx:shared_sphinx_folder cap thinking_sphinx:start cap thinking_sphinx:stop
    251. Installing Sphinx
    252. Set Sphinx index location
    253. Sphinx does not like symlinks
    254. # config/sphinx.yml production: searchd_file_path: \"/var/www/ sphinx-tute/shared/db/sphinx/ production/\"
    255. # config/deploy.rb after \"deploy:setup\", \"thinking_sphinx:shared_sphinx_folder\"
    256. Regular indexing via Cron
    257. Geo-Searching
    258. Need latitude and longitude attributes
    259. As floats
    260. In Radians
    261. define_index do # ... has latitude, longitude end
    262. Stop, Re-index, Restart.
    263. Bar.search( \"Las Vegas\", :geo => [@lat, @lng], :sort => \"@geodist ASC\" )
    264. @bars.each_with_geodist do |bar, distance| # distance is in metres end
    265. Using custom column names?
    266. define_index do # ... set_property( :latitude_attr => :updown, :longitude_attr => :leftright ) end
    267. Geo-searching across multiple models?
    268. ThinkingSphinx::Search.search( \"pancakes\", :geo => [@lat, @lng] :latitude_attr => :lat, :longitude_attr => :lng )
    269. Give it a Shot
    270. Are we done yet?
    271. http://ts.freelancing-gods.com
    272. http://groups.google.com/ group/thinking-sphinx
    273. http://peepcode.com/products/ thinking-sphinx-pdf
    274. Questions?
    275. Thank you.

    + freelancing_godfreelancing_god, 6 months ago

    custom

    3041 views, 8 favs, 3 embeds more stats

    An introduction to using the Sphinx search service more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 3041
      • 3021 on SlideShare
      • 20 from embeds
    • Comments 0
    • Favorites 8
    • Downloads 33
    Most viewed embeds
    • 10 views on http://nhruby.org
    • 8 views on http://ammaryousuf.com
    • 2 views on http://www.nhruby.org

    more

    All embeds
    • 10 views on http://nhruby.org
    • 8 views on http://ammaryousuf.com
    • 2 views on http://www.nhruby.org

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories