Your SlideShare is downloading. ×
0
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Ruby Day Kraków: Full Text Search with Ferret
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Ruby Day Kraków: Full Text Search with Ferret

2,327

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,327
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Ruby Day Kraków: Full Text Search with Ferret Agnieszka Figiel 25th November 2006 Ruby Day Kraków: Full Text Search with Ferret
  2. Agenda full text search implementation options tools for ruby ferret and acts as ferret searching with ferret overview of index options multi search more like it Ruby Day Kraków: Full Text Search with Ferret
  3. Full Text Search A search of a document collection, which examines all of the words in every stored document as it tries to match search words supplied by the user. index tokenize all documents filter out stop words apply stemming apply a term weighting scheme search use the index to find all documents matching a query Ruby Day Kraków: Full Text Search with Ferret
  4. Database Full Text Index MySQL PostgreSQL MS SQL Oracle DB2 Ruby Day Kraków: Full Text Search with Ferret
  5. Search Systems Google, Yahoo Swish-e (C, Perl API available) Lucene (Java, ports for C, C++, .NET, Delphi, Perl, Python, PHP, Common Lisp, ruby) Nutch (Lucene + crawler) Lucene-WS (Lucene via REST) SOLR (Lucene via XML/HTTP and JSON) Ruby Day Kraków: Full Text Search with Ferret
  6. Ruby Search Systems Hyper Estraier Ferret Ruby Day Kraków: Full Text Search with Ferret
  7. Ferret http://rubyforge.org/projects/ferret a text search engine library written for Ruby. It is inspired by Apache Lucene Java project. Ruby Day Kraków: Full Text Search with Ferret
  8. acts as ferret http://projects.jkraemer.net/acts_as_ferret/wiki a plugin for Ruby on Rails which builds on Ferret search across the contents of any Rails model class each model has its own index on disk search multiple models support for Rails Single Table Inheritance index attributes or virtual attributes of a model indexing can be customized by overriding the to doc method find similar items (’more like this’) Ruby Day Kraków: Full Text Search with Ferret
  9. Installation ferret gem: gem install ferret acts as ferret: script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret Ruby Day Kraków: Full Text Search with Ferret
  10. Example YASB (Yet Another Searchable Blog) class Post < ActiveRecord::Base has_many :comments end class Comment < ActiveRecord::Base belongs_to :post end Ruby Day Kraków: Full Text Search with Ferret
  11. Basic post search Let’s add a basic search on the Post model: class Post < ActiveRecord::Base has_many :comments acts_as_ferret end Search posts: Post.find_by_contents(search_term) After running the first search an index will be created for the Post model. ALL fields are indexed if no additional options are given, including arrays of child objects (STI). Ruby Day Kraków: Full Text Search with Ferret
  12. Limit indexed fields To limit the fields that are indexed for a given model we can specify their list: acts_as_ferret :fields => [ ’title’, ’body’ ] NOTE: after any change to index settings, the index needs to be rebuilt. Post.rebuild_index Ruby Day Kraków: Full Text Search with Ferret
  13. Index options There are numerous options of customising ferret’s indexing. Example: acts_as_ferret( :fields => { :title => { :boost => 2 }, :body => { :boost => 1} }, :store_class_name => true) This will add a boost (importance) factor of 2 to the title field, and 1 to the body field. The class name will be stored for multiple class searches. Ruby Day Kraków: Full Text Search with Ferret
  14. Index options: store Value Description :no Don’t store field :yes Store field in its original format. Use this value if you want to highlight matches or print match excerpts a la Google search. :compressed Store field in compressed format. Ruby Day Kraków: Full Text Search with Ferret
  15. Index options: index Value Description :no Do not make this field searchable. :yes Make this field searchable and tok- enize its contents. :untokenized Make this field searchable but do not tokenize its contents. Use this value for fields you wish to sort by. :omit norms Same as :yes except omit the norms file. The norms file can be omit- ted if you don’t boost any fields and you don’t need scoring based on field length. :untokenized omit norms Same as :untokenized except omit the norms file. Ruby Day Kraków: Full Text Search with Ferret
  16. Index options: term vector Value Description :no Don’t store term-vectors :yes Store term-vectors without storing positions or offsets. :with positions Store term-vectors with positions. :with offsets Store term-vectors with offsets. :with positions ofssets Store term-vectors with positions and off- sets. Ruby Day Kraków: Full Text Search with Ferret
  17. Index options: boost Value Description Float The boost property is used to set the default boost for a field. This boost value will used for all instances of the field in the index un- less otherwise specified when you create the field. All values should be positive. Ruby Day Kraków: Full Text Search with Ferret
  18. Search the comments Searching a model and its related models can be achieved with virtual attributes. A getter of all comment messages defined in Post class: def post_comments comments.collect{|c| c.message}.join(’ ’) end Add like a normal field to ferret’s field list: acts_as_ferret :fields => [ ’title’, ’body’, ’post_comments’ ] Ruby Day Kraków: Full Text Search with Ferret
  19. Search in multiple models In case we would like to search for both comments and posts (multi search) we need to: create index for both models for each of them set the store class name flag After rebuilding indices for Post and Comment we can run a multi search on both: Post.multi_search(params[:search],[Comment]) Ruby Day Kraków: Full Text Search with Ferret
  20. More like this We would like a feature of finding the most similar posts to a chosen one. That’s pretty simple: post.more_like_this({:field_names=>[’title’,’body’,’post_comments’], :min_term_freq => 2, :min_doc_freq => 3}) The options passed here tell the search engine 2 things: take into consideration only terms that appear more than once in the source document take into consideration only terms that appear in minimum 3 documents Ruby Day Kraków: Full Text Search with Ferret
  21. Links Products: Swish-e http://swish-e.org/index.html Lucene http://lucene.apache.org/java/docs/index.html Nutch http://lucene.apache.org/nutch/ Lucene-WS http://lucene-ws.sourceforge.net/ SOLR http://incubator.apache.org/solr/ Hyper Estraier http://hyperestraier.sourceforge.net/ Ferret http://rubyforge.org/projects/ferret acts as ferret http://projects.jkraemer.net/acts as ferret/ Reading: tutorial by Roman Mackovcak: http://blog.zmok.net/articles/2006/10/18/full- text-search-in-ruby-on-rails-3-ferret tutorial by Seth Fitzsimmons: http://mojodna.net/searchable/ruby/railsconf.pdf aaf and Unicode by Albert Ramstedt: http://albert.delamednoll.se/articles/2005/12/20/the-ferret-plugin-with-simple- unicode-support Ruby Day Kraków: Full Text Search with Ferret
  22. Thank you! Good luck using ferret! Ruby Day Kraków: Full Text Search with Ferret

×