Introducing Xapian
Upcoming SlideShare
Loading in...5
×
 

Introducing Xapian

on

  • 1,334 views

 

Statistics

Views

Total Views
1,334
Views on SlideShare
1,307
Embed Views
27

Actions

Likes
0
Downloads
12
Comments
0

2 Embeds 27

http://lanyrd.com 24
http://stg.lanyrd.org 3

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introducing Xapian Introducing Xapian Presentation Transcript

  • Introducing Xapian Justin Finkelstein | @ilithium PHP London, November 2011
  • Background and Alternatives ReportBuyer.com
    • 235,000 reports
    • 1.3 GB of text
    • Hierarchical categories
    • MySQL FullText
    Search alternatives:
    • Sphinx
    • Lucene, etc
    Justin Finkelstein | @ilithium PHP London, November 2011
  • Benefits Justin Finkelstein | @ilithium PHP London, November 2011 Easy to install – and portable Fast searching Accurate Powerful
  • Drawbacks Justin Finkelstein | @ilithium PHP London, November 2011 Not a database Single-writer, many reader Limited to 4.2 billion documents OS file size limit
  • Installation Justin Finkelstein | @ilithium PHP London, November 2011 Binaries for Windows Vendor packages & PPA Source code Bindings
    • PHP
    • C#
    • Java
    • Lua
    • Perl
    • Python, etc
  • Indexing Justin Finkelstein | @ilithium PHP London, November 2011 Databases Documents
    • Document IDs – must be unique
    • Terms & Stemmers
    • Term Generator
    • Values
  • Querying the Database Justin Finkelstein | @ilithium PHP London, November 2011 Simple Queries
    • Phrases: “php development”
    • Logical operators: OR, AND, NOT, MAYBE
    • Ranges: alpha..omega
    • NEAR: “shop NEAR pub”
    • Wildcards (“report*”)
    • Synonyms
    Query Parser make it easy “ data management” AND NOT “real estate” AND NEAR data
  • Relevance and Sorting Justin Finkelstein | @ilithium PHP London, November 2011 BM25 Probabilistic Relevancy Sort by rank/relevance Sort by values
  • Getting Started Justin Finkelstein | @ilithium PHP London, November 2011 Know your data set What are users looking for How will they refine their search
  • Report Buyer Product Data Justin Finkelstein | @ilithium PHP London, November 2011 item_guid title subtitle summary table of contents price category publication date availability product url
  • Searching on Report Buyer Justin Finkelstein | @ilithium PHP London, November 2011 Search by:
    • Product code
    • Category
    • Title
    • Price
    Search text of:
    • Title
    • Subtitle
    • Summary
    • Table of Contents
    Refine by:
    • Price
    • Availability
  • Mapping to Xapian Justin Finkelstein PHP London, November 2011 Full text with weighting:
    • name
    • subtitle
    • summary
    • table of contents
    Text with prefixes:
    • title
    • product code
    • category
    Values:
    • price
    • availability
    • publication date
    Facets:
    • Category
    • Availability
  • Demo Walk-throughs Justin Finkelstein | @ilithium PHP London, November 2011 Indexing the data Query parser Sorting MatchSpies
  • The End Justin Finkelstein | @ilithium PHP London, November 2011 http://readthedocs.org/docs/getting-started-with-xapian/ www.redwiredesign.com blog.ilithium.com