Ferret
A Ruby Search Engine
  Brian Sam-Bodden
Agenda

• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda

• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda

• Ferret in Rails
• Resources
What is Ferret?

• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the         Search ...
What is Ferret?

• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
  implemented in C

• Fast! Now Fas...
Concepts
Concepts

• Index : Sequence of documents
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text...
Fields of a Document in
        an Index
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The origina...
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The origina...
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The origina...
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The origina...
It’s all about Indexing

• Indexing is the processing of a source
  document into plain text tokens that Ferret
  can mani...
Installing Ferret
Installing Ferret



gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret



    }
Installing Ferret



    }   Pick the latest version
        for your platform
The Recipe
The Recipe

1. Create some Documents
The Recipe

1. Create some Documents

2. Create an Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries
Example Documents
 Create some Documents
Example Documents
  Create some Documents




 “Any String is a Document”
Example Documents
 Create some Documents
Example Documents
   Create some Documents




[“This”, “is also”, “a document”]
Example Documents
 Create some Documents
Example Documents
 Create some Documents
Ferret::Index::Index
     Create an Index
Ferret::Index::Index
            Create an Index

• Indexes are encapsulated by the class
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use t...
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use t...
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use ...
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use ...
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use ...
Ferret::Index::Index
     Adding Documents to the Index

• Index provides the add_document
  method

• It also provides th...
Ferret::Index::Index
   Perform some Queries
Ferret::Index::Index
         Perform some Queries

• Index provides the search and
  search_each methods
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search metho...
Ferret::Index::Index
           Perform some Queries

• Index provides the search and
  search_each methods

• search meth...
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search metho...
Ferret::Index::Index
            Perform some Queries

• Index provides the search and
  search_each methods

• search met...
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language

• Ferret own Query Language, FQL is a
  powerful way to specify search queries

• FQL supports many...
Index.explain

• The explain method of Index describes
  how a document score against a query
 • Very useful for debugging...
Index.explain
Ferret in your App
Application


                   Database             Web


                                           ...
Ferret in Rails

• Acts As Ferret is an ActiveRecord
  extension

• Available as a plugin
• Provides a simplified interface...
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails
• Simple model has two searchable
  fields title and body:
Ferret in Rails

• After a quick rake db:migrate we now
  have some data to play with
• Fire up the Rails Console and let’...
Ferret in Rails
Want more?

• Ferret is improving constantly
• Acts As Ferret seems to catch up
  quickly

• Real-life usage seems to requ...
Want more?

• We only covered the simplest
  constructs in Ferret

• Ferret’s API provides enough
  flexibility for the mos...
Online Resources

• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkr...
In-Print Resources
Thanks!
Upcoming SlideShare
Loading in …5
×

Ferret A Ruby Search Engine

3,243 views

Published on

Published in: Lifestyle, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,243
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
41
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Ferret A Ruby Search Engine

  1. 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  2. 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  3. 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  4. 4. Agenda • Ferret in Rails • Resources
  5. 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  6. 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  7. 7. Concepts
  8. 8. Concepts • Index : Sequence of documents
  9. 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  10. 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  11. 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  12. 12. Fields of a Document in an Index
  13. 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  14. 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  15. 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  16. 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  17. 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  18. 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  19. 19. Installing Ferret
  20. 20. Installing Ferret gem install ferret
  21. 21. Installing Ferret
  22. 22. Installing Ferret
  23. 23. Installing Ferret }
  24. 24. Installing Ferret } Pick the latest version for your platform
  25. 25. The Recipe
  26. 26. The Recipe 1. Create some Documents
  27. 27. The Recipe 1. Create some Documents 2. Create an Index
  28. 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  29. 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  30. 30. Example Documents Create some Documents
  31. 31. Example Documents Create some Documents “Any String is a Document”
  32. 32. Example Documents Create some Documents
  33. 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  34. 34. Example Documents Create some Documents
  35. 35. Example Documents Create some Documents
  36. 36. Ferret::Index::Index Create an Index
  37. 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  38. 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  39. 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  40. 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  41. 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  42. 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  43. 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  44. 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  45. 45. Ferret::Index::Index Perform some Queries
  46. 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  47. 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  48. 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  49. 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  50. 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  51. 51. Playing with Ferret in irb
  52. 52. Playing with Ferret in irb
  53. 53. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  54. 54. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  55. 55. Index.explain
  56. 56. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
  57. 57. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  58. 58. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  59. 59. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  60. 60. Ferret in Rails • Simple model has two searchable fields title and body:
  61. 61. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  62. 62. Ferret in Rails
  63. 63. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  64. 64. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  65. 65. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret
  66. 66. In-Print Resources
  67. 67. Thanks!

×