Ferret

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Ferret - Presentation Transcript

    1. Ferret A Ruby Search Engine Brian Sam-Bodden
    2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
    3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
    4. Agenda • Ferret in Rails • Resources
    5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
    6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
    7. Concepts
    8. Concepts • Index : Sequence of documents
    9. Concepts • Index : Sequence of documents • Document : Sequence of fields
    10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
    11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
    12. Fields of a Document in an Index
    13. Fields of a Document in an Index • Fields are individually searchable units that are:
    14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
    15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
    16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
    17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
    18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
    19. Installing Ferret
    20. Installing Ferret gem install ferret
    21. Installing Ferret
    22. Installing Ferret
    23. Installing Ferret }
    24. Installing Ferret } Pick the latest version for your platform
    25. The Recipe
    26. The Recipe 1. Create some Documents
    27. The Recipe 1. Create some Documents 2. Create an Index
    28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
    29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
    30. Example Documents Create some Documents
    31. Example Documents Create some Documents “Any String is a Document”
    32. Example Documents Create some Documents
    33. Example Documents Create some Documents [“This”, “is also”, “a document”]
    34. Example Documents Create some Documents
    35. Example Documents Create some Documents
    36. Ferret::Index::Index Create an Index
    37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
    38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
    39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
    40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
    41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
    42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
    43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
    44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
    45. Ferret::Index::Index Perform some Queries
    46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
    47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
    48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
    49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
    50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
    51. Playing with Ferret in irb
    52. Playing with Ferret in irb
    53. Playing with Ferret in irb
    54. Playing with Ferret in irb
    55. Playing with Ferret in irb
    56. Playing with Ferret in irb
    57. Playing with Ferret in irb
    58. Playing with Ferret in irb
    59. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
    60. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
    61. Index.explain
    62. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
    63. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
    64. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
    65. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
    66. Ferret in Rails • Simple model has two searchable fields title and body:
    67. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
    68. Ferret in Rails
    69. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
    70. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
    71. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret
    72. In-Print Resources
    73. Thanks!

    + Brian Sam-BoddenBrian Sam-Bodden, 2 years ago

    custom

    1306 views, 0 favs, 0 embeds more stats

    Introduction to Ferret, the Ruby Full-Text Search E more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1306
      • 1306 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 11
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories