Introducing Xapian

1,450 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,450
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introducing Xapian

  1. 1. Introducing Xapian Justin Finkelstein | @ilithium PHP London, November 2011
  2. 2. Background and Alternatives ReportBuyer.com <ul><li>235,000 reports
  3. 3. 1.3 GB of text
  4. 4. Hierarchical categories
  5. 5. MySQL FullText </li></ul>Search alternatives: <ul><li>Sphinx
  6. 6. Lucene, etc </li></ul>Justin Finkelstein | @ilithium PHP London, November 2011
  7. 7. Benefits Justin Finkelstein | @ilithium PHP London, November 2011 Easy to install – and portable Fast searching Accurate Powerful
  8. 8. Drawbacks Justin Finkelstein | @ilithium PHP London, November 2011 Not a database Single-writer, many reader Limited to 4.2 billion documents OS file size limit
  9. 9. Installation Justin Finkelstein | @ilithium PHP London, November 2011 Binaries for Windows Vendor packages & PPA Source code Bindings <ul><li>PHP
  10. 10. C#
  11. 11. Java
  12. 12. Lua
  13. 13. Perl
  14. 14. Python, etc </li></ul>
  15. 15. Indexing Justin Finkelstein | @ilithium PHP London, November 2011 Databases Documents <ul><li>Document IDs – must be unique
  16. 16. Terms & Stemmers
  17. 17. Term Generator
  18. 18. Values </li></ul>
  19. 19. Querying the Database Justin Finkelstein | @ilithium PHP London, November 2011 Simple Queries <ul><li>Phrases: “php development”
  20. 20. Logical operators: OR, AND, NOT, MAYBE
  21. 21. Ranges: alpha..omega
  22. 22. NEAR: “shop NEAR pub”
  23. 23. Wildcards (“report*”)
  24. 24. Synonyms </li></ul>Query Parser make it easy “ data management” AND NOT “real estate” AND NEAR data
  25. 25. Relevance and Sorting Justin Finkelstein | @ilithium PHP London, November 2011 BM25 Probabilistic Relevancy Sort by rank/relevance Sort by values
  26. 26. Getting Started Justin Finkelstein | @ilithium PHP London, November 2011 Know your data set What are users looking for How will they refine their search
  27. 27. Report Buyer Product Data Justin Finkelstein | @ilithium PHP London, November 2011 item_guid title subtitle summary table of contents price category publication date availability product url
  28. 28. Searching on Report Buyer Justin Finkelstein | @ilithium PHP London, November 2011 Search by: <ul><li>Product code
  29. 29. Category
  30. 30. Title
  31. 31. Price </li></ul>Search text of: <ul><li>Title
  32. 32. Subtitle
  33. 33. Summary
  34. 34. Table of Contents </li></ul>Refine by: <ul><li>Price
  35. 35. Availability </li></ul>
  36. 36. Mapping to Xapian Justin Finkelstein PHP London, November 2011 Full text with weighting: <ul><li>name
  37. 37. subtitle
  38. 38. summary
  39. 39. table of contents </li></ul>Text with prefixes: <ul><li>title
  40. 40. product code
  41. 41. category </li></ul>Values: <ul><li>price
  42. 42. availability
  43. 43. publication date </li></ul>Facets: <ul><li>Category
  44. 44. Availability </li></ul>
  45. 45. Demo Walk-throughs Justin Finkelstein | @ilithium PHP London, November 2011 Indexing the data Query parser Sorting MatchSpies
  46. 46. The End Justin Finkelstein | @ilithium PHP London, November 2011 http://readthedocs.org/docs/getting-started-with-xapian/ www.redwiredesign.com blog.ilithium.com

×