Ferret

2,366 views

Published on

Introduction to Ferret, the Ruby Full-Text Search Engine

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,366
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
29
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Ferret

  1. 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  2. 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  3. 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  4. 4. Agenda • Ferret in Rails • Resources
  5. 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  6. 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  7. 7. Concepts
  8. 8. Concepts • Index : Sequence of documents
  9. 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  10. 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  11. 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  12. 12. Fields of a Document in an Index
  13. 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  14. 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  15. 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  16. 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  17. 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  18. 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  19. 19. Installing Ferret
  20. 20. Installing Ferret gem install ferret
  21. 21. Installing Ferret
  22. 22. Installing Ferret
  23. 23. Installing Ferret }
  24. 24. Installing Ferret } Pick the latest version for your platform
  25. 25. The Recipe
  26. 26. The Recipe 1. Create some Documents
  27. 27. The Recipe 1. Create some Documents 2. Create an Index
  28. 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  29. 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  30. 30. Example Documents Create some Documents
  31. 31. Example Documents Create some Documents “Any String is a Document”
  32. 32. Example Documents Create some Documents
  33. 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  34. 34. Example Documents Create some Documents
  35. 35. Example Documents Create some Documents
  36. 36. Ferret::Index::Index Create an Index
  37. 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  38. 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  39. 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  40. 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  41. 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  42. 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  43. 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  44. 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  45. 45. Ferret::Index::Index Perform some Queries
  46. 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  47. 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  48. 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  49. 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  50. 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  51. 51. Playing with Ferret in irb
  52. 52. Playing with Ferret in irb
  53. 53. Playing with Ferret in irb
  54. 54. Playing with Ferret in irb
  55. 55. Playing with Ferret in irb
  56. 56. Playing with Ferret in irb
  57. 57. Playing with Ferret in irb
  58. 58. Playing with Ferret in irb
  59. 59. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  60. 60. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  61. 61. Index.explain
  62. 62. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
  63. 63. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  64. 64. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  65. 65. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  66. 66. Ferret in Rails • Simple model has two searchable fields title and body:
  67. 67. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  68. 68. Ferret in Rails
  69. 69. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  70. 70. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  71. 71. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret
  72. 72. In-Print Resources
  73. 73. Thanks!

×