Using Thinking Sphinx with rails
Upcoming SlideShare
Loading in...5
×
 

Using Thinking Sphinx with rails

on

  • 16,290 views

Thinking Sphinx presented at Ruby Fun Day (http://www.rubyonrails.in/events/3)

Thinking Sphinx presented at Ruby Fun Day (http://www.rubyonrails.in/events/3)

Statistics

Views

Total Views
16,290
Views on SlideShare
16,127
Embed Views
163

Actions

Likes
17
Downloads
117
Comments
0

7 Embeds 163

http://www.slideshare.net 88
http://www.funonrails.com 66
http://localhost 3
http://propertiger.local 2
http://funonrails.com 2
http://localhost:3000 1
http://funonrails.blogspot.in 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using Thinking Sphinx with rails Using Thinking Sphinx with rails Presentation Transcript

  • Free open-source SQL full-text search engine An acronym for SQL Phrase Index Developed by Andrew Aksyonoff
    • database search
      • Using SQL directly: like "%text%"
      • impractical for large text fields.
      • no relevance ranking.
    • full text search
      • searches all words in every document against query.
      • moves processing load out of DB.
      • relvance ranking.
      • other advanced features.
    • 2 step process
      • indexing
        • scan text and build a list of search terms.
      • searching
        • search into index to get refrences to data.
    • High indexing speed.
      • upto 10 MB/sec on modern CPUs.
    • High search speed.
      • avg query is under 0.1 sec on 2-4 GB text collections.
    • High scalability.
      • upto 100 GB text, upto 100M documents on a single CPU.
    • Supports distributed searching.
      • can be extended to multiple servers.
    • Supports phrase proximity ranking.
      • providing good relevance.
    • Supports stopwords.
      • exclude common words like – a, an, the, with, in
    • Supports different search modes
      • "match all", "match phrase" and "match any"
    • Supports relevance modification on the fly.
    • Key Sphinx features are its speed and phrase proximity ranking.
    • boardreader.com
      • Indexes over 2 billion documents, BoardReader forum search engine is the biggest Sphinx installation at present.
    • mininova.org
      • Mininova, popular BitTorrent search engine, serves 3-5 million searches daily.
    • thepiratebay.org
      • The Pirate Bay and (forthcoming) SuprNova moved to Sphinx recently.
    • netlog.com
      • NetLog, a large social network site with over 35 million registered users, uses Sphinx for pretty every kind of search imaginable - people, photo, blog, event, music, and video searches. 12 million daily queries against 100+ GB indexes are handled by just 2 quad-core search boxes.
    • Sphinx can be downloaded from http://www.sphinxsearch.com/
    • Its distribution contains the following programs:
    • indexer
      • utility to create fulltext indices
    • searchd
      • daemon to search through fulltext indices
    • search
      • test utility to query fulltext indices from command line
    • sphinxapi
      • set of API libraries for Ruby, Python, Perl, Java.
    • Configuration
      • settings for indexer and searchd
    • Indexes, Fields, Attributes.
    • Each index has a document id , some fields , and some attributes .
      • The id has to be unique , generally it’s the primary key.
      • The fields contain the text that is to be searched .
      • The attributes contain the data used for sorting , filtering and grouping .
    • thinking_sphinx
      • Pat Allan
      • also developed the underlying API for Sphinx, Riddle.
      • git://github.com/freelancing-god/thinking-sphinx.git
    • ultrasphinx
      • Evan Weaver
      • svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk
    • Can be installed simply by
      • ruby scriptplugin install <path_to_plugin>
    • No need to write the sphinx configuration file, plugins take care of this.
    • field aliasing
      • indexes full_name, :as => :name
    • field merging
      • [first_name, last_name], :as => :name
    • field weighting
      • set_property :field_weights => {: last_name =>2, :first_name => 1}
      • User.search &quot;aaa&quot;, :field_weights => { :first_name => 1, :last_name => 2}
    • index computed value
      • indexes &quot;age > 15&quot;, :as => :minor
    • sorting (using attributes and fields)
      • :sortable => true
      • has created_at
      • User.search(&quot;user&quot;, :order => :first_name, :sort_mode => :desc)
      • User.search(&quot;user&quot;, :order => &quot;created_at DESC&quot;)
    • filtering (using attributes and fields)
      • User.search :conditions => {:name => &quot;aaa&quot;}
      • User.search :with => {:age => 10}
      • User.search :without => {:age => 10}
    • add custom SQL conditions to index
      • where &quot;first_name = 'aaa'&quot;
    • drop-in compatibility with will_paginate
      • User.search &quot;aaa&quot;, :page => (params[:page] || 1)
    • geodistance
      • has :latit
      • has :longit
      • set_property :latitude_attr => :latit, :longitude_attr => :longit
      • Address.search &quot;pizza hut&quot;, :geo => [1.234, 4.567], :order => &quot;@geodist asc&quot;
    • delta index support
      • set_property :delta => true
    • searching across multiple models
      • indexes posts.name
      • indexes posts.comments.name
    • comprehensive rake tasks
      • rake ts:conf
      • rake ts:in
      • rake ts:start, restart, stop
    • multiple deployment environments
      • rake ts:config RAILS_ENV=production
    • one-to-one
      • user has_one blog
      • indexes blog.name
    • one-to-many
      • blog has_many posts
      • indexes posts.name
    • many-to-many (through)
      • posts has_many comments through records
      • comments has_many posts through records
      • indexes comments.name
      • deeply nested
        • blog has_many posts
        • posts has_many comments
        • indexes posts.comments.name
      • STI
        • User.search(&quot;user&quot;, :with => {:class_crc => Teacher.to_crc32})
      • polymorphic
        • user has_one phone
        • company has_one phone
        • indexes phone.name
        • where &quot;callable_type = 'User'“
    • You can run the index task while Sphinx is running, and it’ll reload the indexes automatically.
    • As of version 0.9.9, your configuration will automatically be reloaded.
    • Keep in mind that if any keywords for Ruby methods - such as id or name - clash with your column names, you need to use the symbol version.
    • Sphinx connects to DB directly, so don’t expect that any of the model methods can be indexed.
    • You can extract commands for indexing and starting search daemon into scripts for fast access.
      • indexer --config config/development.sphinx.conf --all
      • searchd --config config/development.sphinx.conf
    • skip this warning
      • distributed index 'model_name' can not be directly indexed; skipping.
    • Almost has all thinking_sphinx features with some additional features:
      • excerpt highlighting
      • spellcheck fields*
      • faceting on text, date, and numeric fields*
      • *will be demonstrated in next presentation
    • sphinx
      • http://www.sphinxsearch.com/
    • ultrasphinx
      • http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/files/README.html
    • thinking_sphinx
      • http://ts.freelancing-gods.com/
      • http://groups.google.com/group/thinking-sphinx/