Ferret

2,472
-1

Published on

gives an overview of ferret

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,472
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
59
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ferret

  1. 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  2. 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  3. 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  4. 4. Agenda • Ferret in Rails • Resources
  5. 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  6. 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  7. 7. Concepts
  8. 8. Concepts • Index : Sequence of documents
  9. 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  10. 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  11. 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  12. 12. Fields of a Document in an Index
  13. 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  14. 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  15. 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  16. 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  17. 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  18. 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  19. 19. Installing Ferret
  20. 20. Installing Ferret gem install ferret
  21. 21. Installing Ferret
  22. 22. Installing Ferret
  23. 23. Installing Ferret }
  24. 24. Installing Ferret } Pick the latest version for your platform
  25. 25. The Recipe
  26. 26. The Recipe 1. Create some Documents
  27. 27. The Recipe 1. Create some Documents 2. Create an Index
  28. 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  29. 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  30. 30. Example Documents Create some Documents
  31. 31. Example Documents Create some Documents “Any String is a Document”
  32. 32. Example Documents Create some Documents
  33. 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  34. 34. Example Documents Create some Documents
  35. 35. Example Documents Create some Documents
  36. 36. Ferret::Index::Index Create an Index
  37. 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  38. 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  39. 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  40. 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  41. 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  42. 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  43. 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  44. 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  45. 45. Ferret::Index::Index Perform some Queries
  46. 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  47. 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  48. 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  49. 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  50. 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  51. 51. Playing with Ferret in irb
  52. 52. Playing with Ferret in irb
  53. 53. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  54. 54. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  55. 55. Index.explain
  56. 56. Ferret in your App Application Database Web User Manual File System Input Present Get User’s Gather Data Search Results Query Index Search Index Documents Ferret Index
  57. 57. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  58. 58. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  59. 59. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  60. 60. Ferret in Rails • Simple model has two searchable fields title and body:
  61. 61. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  62. 62. Ferret in Rails
  63. 63. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  64. 64. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  65. 65. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret
  66. 66. In-Print Resources
  67. 67. Thanks!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×