• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Ferret
 

Ferret

on

  • 3,813 views

gives an overview of ferret

gives an overview of ferret

Statistics

Views

Total Views
3,813
Views on SlideShare
3,813
Embed Views
0

Actions

Likes
0
Downloads
58
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Ferret Ferret Presentation Transcript

    • Ferret A Ruby Search Engine Brian Sam-Bodden
    • Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
    • Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
    • Agenda • Ferret in Rails • Resources
    • What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
    • What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
    • Concepts
    • Concepts • Index : Sequence of documents
    • Concepts • Index : Sequence of documents • Document : Sequence of fields
    • Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
    • Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
    • Fields of a Document in an Index
    • Fields of a Document in an Index • Fields are individually searchable units that are:
    • Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
    • Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
    • Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
    • Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
    • It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
    • Installing Ferret
    • Installing Ferret gem install ferret
    • Installing Ferret
    • Installing Ferret
    • Installing Ferret }
    • Installing Ferret } Pick the latest version for your platform
    • The Recipe
    • The Recipe 1. Create some Documents
    • The Recipe 1. Create some Documents 2. Create an Index
    • The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
    • The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
    • Example Documents Create some Documents
    • Example Documents Create some Documents “Any String is a Document”
    • Example Documents Create some Documents
    • Example Documents Create some Documents [“This”, “is also”, “a document”]
    • Example Documents Create some Documents
    • Example Documents Create some Documents
    • Ferret::Index::Index Create an Index
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
    • Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
    • Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
    • Ferret::Index::Index Perform some Queries
    • Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
    • Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
    • Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
    • Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
    • Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
    • Playing with Ferret in irb
    • Playing with Ferret in irb
    • Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
    • Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
    • Index.explain
    • Ferret in your App Application Database Web User Manual File System Input Present Get User’s Gather Data Search Results Query Index Search Index Documents Ferret Index
    • Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
    • Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
    • Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
    • Ferret in Rails • Simple model has two searchable fields title and body:
    • Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
    • Ferret in Rails
    • Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
    • Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
    • Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret
    • In-Print Resources
    • Thanks!