Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ruby Day Kraków: Full Text Search
                    with Ferret

                                                Agniesz...
Agenda



              full text search implementation options
              tools for ruby
              ferret and acts...
Full Text Search

      A search of a document collection, which examines all of the words
      in every stored document ...
Database Full Text Index




              MySQL
              PostgreSQL
              MS SQL
              Oracle
      ...
Search Systems



              Google, Yahoo
              Swish-e (C, Perl API available)
              Lucene (Java, po...
Ruby Search Systems




              Hyper Estraier
              Ferret




Ruby Day Kraków: Full Text Search with Ferret
Ferret




      http://rubyforge.org/projects/ferret

      a text search engine library written for Ruby. It is inspired...
acts as ferret


      http://projects.jkraemer.net/acts_as_ferret/wiki

      a plugin for Ruby on Rails which builds on ...
Installation



      ferret gem:

      gem install ferret

      acts as ferret:
      script/plugin install
      svn:/...
Example




      YASB (Yet Another Searchable Blog)
       class Post < ActiveRecord::Base
         has_many :comments
  ...
Basic post search


      Let’s add a basic search on the Post model:
       class Post < ActiveRecord::Base
         has_...
Limit indexed fields



      To limit the fields that are indexed for a given model we can
      specify their list:
      ...
Index options


      There are numerous options of customising ferret’s indexing.

      Example:
         acts_as_ferret...
Index options: store



             Value                  Description
             :no                    Don’t store fie...
Index options: index

        Value                                   Description
        :no                             ...
Index options: term vector



        Value                                   Description
        :no                     ...
Index options: boost



                  Value         Description
                  Float         The boost property is ...
Search the comments


      Searching a model and its related models can be achieved with
      virtual attributes.

     ...
Search in multiple models



      In case we would like to search for both comments and posts
      (multi search) we nee...
More like this


      We would like a feature of finding the most similar posts to a
      chosen one.
      That’s pretty...
Links
      Products:
              Swish-e http://swish-e.org/index.html
              Lucene http://lucene.apache.org/ja...
Thank you!


      Good luck using ferret!




Ruby Day Kraków: Full Text Search with Ferret
Upcoming SlideShare
Loading in …5
×

Ruby Day Kraków: Full Text Search with Ferret

2,784 views

Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Ruby Day Kraków: Full Text Search with Ferret

  1. 1. Ruby Day Kraków: Full Text Search with Ferret Agnieszka Figiel 25th November 2006 Ruby Day Kraków: Full Text Search with Ferret
  2. 2. Agenda full text search implementation options tools for ruby ferret and acts as ferret searching with ferret overview of index options multi search more like it Ruby Day Kraków: Full Text Search with Ferret
  3. 3. Full Text Search A search of a document collection, which examines all of the words in every stored document as it tries to match search words supplied by the user. index tokenize all documents filter out stop words apply stemming apply a term weighting scheme search use the index to find all documents matching a query Ruby Day Kraków: Full Text Search with Ferret
  4. 4. Database Full Text Index MySQL PostgreSQL MS SQL Oracle DB2 Ruby Day Kraków: Full Text Search with Ferret
  5. 5. Search Systems Google, Yahoo Swish-e (C, Perl API available) Lucene (Java, ports for C, C++, .NET, Delphi, Perl, Python, PHP, Common Lisp, ruby) Nutch (Lucene + crawler) Lucene-WS (Lucene via REST) SOLR (Lucene via XML/HTTP and JSON) Ruby Day Kraków: Full Text Search with Ferret
  6. 6. Ruby Search Systems Hyper Estraier Ferret Ruby Day Kraków: Full Text Search with Ferret
  7. 7. Ferret http://rubyforge.org/projects/ferret a text search engine library written for Ruby. It is inspired by Apache Lucene Java project. Ruby Day Kraków: Full Text Search with Ferret
  8. 8. acts as ferret http://projects.jkraemer.net/acts_as_ferret/wiki a plugin for Ruby on Rails which builds on Ferret search across the contents of any Rails model class each model has its own index on disk search multiple models support for Rails Single Table Inheritance index attributes or virtual attributes of a model indexing can be customized by overriding the to doc method find similar items (’more like this’) Ruby Day Kraków: Full Text Search with Ferret
  9. 9. Installation ferret gem: gem install ferret acts as ferret: script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret Ruby Day Kraków: Full Text Search with Ferret
  10. 10. Example YASB (Yet Another Searchable Blog) class Post < ActiveRecord::Base has_many :comments end class Comment < ActiveRecord::Base belongs_to :post end Ruby Day Kraków: Full Text Search with Ferret
  11. 11. Basic post search Let’s add a basic search on the Post model: class Post < ActiveRecord::Base has_many :comments acts_as_ferret end Search posts: Post.find_by_contents(search_term) After running the first search an index will be created for the Post model. ALL fields are indexed if no additional options are given, including arrays of child objects (STI). Ruby Day Kraków: Full Text Search with Ferret
  12. 12. Limit indexed fields To limit the fields that are indexed for a given model we can specify their list: acts_as_ferret :fields => [ ’title’, ’body’ ] NOTE: after any change to index settings, the index needs to be rebuilt. Post.rebuild_index Ruby Day Kraków: Full Text Search with Ferret
  13. 13. Index options There are numerous options of customising ferret’s indexing. Example: acts_as_ferret( :fields => { :title => { :boost => 2 }, :body => { :boost => 1} }, :store_class_name => true) This will add a boost (importance) factor of 2 to the title field, and 1 to the body field. The class name will be stored for multiple class searches. Ruby Day Kraków: Full Text Search with Ferret
  14. 14. Index options: store Value Description :no Don’t store field :yes Store field in its original format. Use this value if you want to highlight matches or print match excerpts a la Google search. :compressed Store field in compressed format. Ruby Day Kraków: Full Text Search with Ferret
  15. 15. Index options: index Value Description :no Do not make this field searchable. :yes Make this field searchable and tok- enize its contents. :untokenized Make this field searchable but do not tokenize its contents. Use this value for fields you wish to sort by. :omit norms Same as :yes except omit the norms file. The norms file can be omit- ted if you don’t boost any fields and you don’t need scoring based on field length. :untokenized omit norms Same as :untokenized except omit the norms file. Ruby Day Kraków: Full Text Search with Ferret
  16. 16. Index options: term vector Value Description :no Don’t store term-vectors :yes Store term-vectors without storing positions or offsets. :with positions Store term-vectors with positions. :with offsets Store term-vectors with offsets. :with positions ofssets Store term-vectors with positions and off- sets. Ruby Day Kraków: Full Text Search with Ferret
  17. 17. Index options: boost Value Description Float The boost property is used to set the default boost for a field. This boost value will used for all instances of the field in the index un- less otherwise specified when you create the field. All values should be positive. Ruby Day Kraków: Full Text Search with Ferret
  18. 18. Search the comments Searching a model and its related models can be achieved with virtual attributes. A getter of all comment messages defined in Post class: def post_comments comments.collect{|c| c.message}.join(’ ’) end Add like a normal field to ferret’s field list: acts_as_ferret :fields => [ ’title’, ’body’, ’post_comments’ ] Ruby Day Kraków: Full Text Search with Ferret
  19. 19. Search in multiple models In case we would like to search for both comments and posts (multi search) we need to: create index for both models for each of them set the store class name flag After rebuilding indices for Post and Comment we can run a multi search on both: Post.multi_search(params[:search],[Comment]) Ruby Day Kraków: Full Text Search with Ferret
  20. 20. More like this We would like a feature of finding the most similar posts to a chosen one. That’s pretty simple: post.more_like_this({:field_names=>[’title’,’body’,’post_comments’], :min_term_freq => 2, :min_doc_freq => 3}) The options passed here tell the search engine 2 things: take into consideration only terms that appear more than once in the source document take into consideration only terms that appear in minimum 3 documents Ruby Day Kraków: Full Text Search with Ferret
  21. 21. Links Products: Swish-e http://swish-e.org/index.html Lucene http://lucene.apache.org/java/docs/index.html Nutch http://lucene.apache.org/nutch/ Lucene-WS http://lucene-ws.sourceforge.net/ SOLR http://incubator.apache.org/solr/ Hyper Estraier http://hyperestraier.sourceforge.net/ Ferret http://rubyforge.org/projects/ferret acts as ferret http://projects.jkraemer.net/acts as ferret/ Reading: tutorial by Roman Mackovcak: http://blog.zmok.net/articles/2006/10/18/full- text-search-in-ruby-on-rails-3-ferret tutorial by Seth Fitzsimmons: http://mojodna.net/searchable/ruby/railsconf.pdf aaf and Unicode by Albert Ramstedt: http://albert.delamednoll.se/articles/2005/12/20/the-ferret-plugin-with-simple- unicode-support Ruby Day Kraków: Full Text Search with Ferret
  22. 22. Thank you! Good luck using ferret! Ruby Day Kraków: Full Text Search with Ferret

×