Indexing Stuff &&
Things with Sphinx and Perl
Houston Perl Mongers
May 8th, 2014
Hosted by cPanel, Inc.
Brett Estrade <est...
Sphinx
● full text search indexer and daemon
● indexer - builds indexes
● searchd - services search requests
● very easy t...
Sphinx Data Sources
● Directly from MySQL (MariaDB), PostgreSQL
○ Indexing data from arbitrary SQL
○ Excellent for fast re...
Search Interface
● Native protocol (e.g., Sphinx::Search)
● Supports MySQL protocol (4.1)
○ Subset of SQL supported is cal...
indexer data
named index for
searchd
searchd config
Client Example - Sphinx::Search
search term -
empty string
returns “all”
Search Results
Some Common Use Cases
● Rebuild index from database regularly
● Incrementally add to existing index
● Query Sphinx for DB ...
Real Life Examples
1. Indexing MariaDB
2. Filtering on string using CRC32
3. Creating sources w/Sphinx::XML::Pipe2
4. Dyna...
Indexing MariaBD ~2.25 Million Rows
● Use case - saving eBay auction data in DB
● Providing search interface to it
● Demo ...
How to Filter on Strings
● Requires CRC32 hashing (strings to ints)
● When indexing, use MySQL’s CRC32 function
● Use Perl...
And inside of client, use Perl’s String::CRC32 to encode to the same integer
Transforming Things to XMLPipe2
● XMLPipe2 is Sphinx’s generic data format
● Extract/Transform scripts -> XMLPipe2
● use S...
Sample XMLPipe2 File
Sample XMLPipe2 Source Conf Entry
Example XMLPipe2 Use Case
● Monitor ephemera,e.g. active eBay listings
● Don’t want to use a database
● Many data partitio...
Dynamic Indexing of XMLPipe2 Stuff
● Fact - Sphinx partitions data by indexes
● Problem - each index uses its own data fil...
Sphinx’s --config to the Rescue!
● Config files are typically static, right?
● Sphinx can handle executables via --config
...
Sphinx::Config::Builder
● Module I created specifically for this case
○ uploaded to CPAN
● Why? No Sphinx config builders ...
Solution
● Expects XML2Pipe data files to already exist
● Iterate over array of indexes to build
● Creates “source” entrie...
Demo
Tip of the Iceberg
● Sphinx has TONs of options and modes
● Tons of areas of application
● Many clients, Simple interface
...
Thank You!
● http://sphinxsearch.com/
● cpan://Sphinx::Search
● cpan://Sphinx::Config::Builder
● http://houston.pm.org
Upcoming SlideShare
Loading in...5
×

Sphinx && Perl Houston Perl Mongers - May 8th, 2014

307

Published on

Slides from a talk on Spinx and Perl for the Houston Perl Mongers group - May 8th, 2014.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
307
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sphinx && Perl Houston Perl Mongers - May 8th, 2014

  1. 1. Indexing Stuff && Things with Sphinx and Perl Houston Perl Mongers May 8th, 2014 Hosted by cPanel, Inc. Brett Estrade <estrabd@gmail.com>
  2. 2. Sphinx ● full text search indexer and daemon ● indexer - builds indexes ● searchd - services search requests ● very easy to install and configure
  3. 3. Sphinx Data Sources ● Directly from MySQL (MariaDB), PostgreSQL ○ Indexing data from arbitrary SQL ○ Excellent for fast reading of expensive JOINs ● XMLPipe2 ○ General intermediate data understood by Sphinx
  4. 4. Search Interface ● Native protocol (e.g., Sphinx::Search) ● Supports MySQL protocol (4.1) ○ Subset of SQL supported is called SphinxQL
  5. 5. indexer data
  6. 6. named index for searchd
  7. 7. searchd config
  8. 8. Client Example - Sphinx::Search search term - empty string returns “all”
  9. 9. Search Results
  10. 10. Some Common Use Cases ● Rebuild index from database regularly ● Incrementally add to existing index ● Query Sphinx for DB primary keys, make DB call for related rows ● Query Sphinx for wanted data (no DB at all) == my use case
  11. 11. Real Life Examples 1. Indexing MariaDB 2. Filtering on string using CRC32 3. Creating sources w/Sphinx::XML::Pipe2 4. Dynamic config w/Sphinx::Config::Builder
  12. 12. Indexing MariaBD ~2.25 Million Rows ● Use case - saving eBay auction data in DB ● Providing search interface to it ● Demo run of indexer
  13. 13. How to Filter on Strings ● Requires CRC32 hashing (strings to ints) ● When indexing, use MySQL’s CRC32 function ● Use Perl’s String::CRC32 to encode string, ○ then set filter
  14. 14. And inside of client, use Perl’s String::CRC32 to encode to the same integer
  15. 15. Transforming Things to XMLPipe2 ● XMLPipe2 is Sphinx’s generic data format ● Extract/Transform scripts -> XMLPipe2 ● use Sphinx::XML::Pipe2; #’nuff said
  16. 16. Sample XMLPipe2 File
  17. 17. Sample XMLPipe2 Source Conf Entry
  18. 18. Example XMLPipe2 Use Case ● Monitor ephemera,e.g. active eBay listings ● Don’t want to use a database ● Many data partitions (i.e., indexes) ○ e.g., by store, by category, etc ○ > 250 (yikes!) ● Data partitions change over time (slowly)
  19. 19. Dynamic Indexing of XMLPipe2 Stuff ● Fact - Sphinx partitions data by indexes ● Problem - each index uses its own data file ○ data as XMLPipe2 ● Challenge - how to manage a changing set of indexes?
  20. 20. Sphinx’s --config to the Rescue! ● Config files are typically static, right? ● Sphinx can handle executables via --config ● indexer --config ./generate-config.pl --all
  21. 21. Sphinx::Config::Builder ● Module I created specifically for this case ○ uploaded to CPAN ● Why? No Sphinx config builders were a fit ● Module is low level and does what I need ○ i.e., dynamically builds a XMLPipe2 specific config ● A+ 100 Passing ○ http://cpantesters.org/distro/S/Sphinx-Config-Builder.html
  22. 22. Solution ● Expects XML2Pipe data files to already exist ● Iterate over array of indexes to build ● Creates “source” entries for XMLPipe2 data ● Creates “index” entries for each “source”
  23. 23. Demo
  24. 24. Tip of the Iceberg ● Sphinx has TONs of options and modes ● Tons of areas of application ● Many clients, Simple interface ● Super easy to install and maintain
  25. 25. Thank You! ● http://sphinxsearch.com/ ● cpan://Sphinx::Search ● cpan://Sphinx::Config::Builder ● http://houston.pm.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×