Using Sphinx for Search in PHP

Using Sphinx
for Search
Mike Lively
Slickdeals, LLC

What is Sphinx?
• A full-text search engine
• Quickly get high quality (relevant) results
• Designed to integrate well with SQL RDBMS
• Can work with any data source
• Can be queried using either an API or SQL

How do I know anything
about Sphinx?
• Manager of Software Architecture for
Slickdeals.net
• Alexa top 150 site (in the US)
• Have been working at improving our Sphinx
search engine for the last 2 months or so.
• Over 7 Million searches a month directly through
the interface, lots more happen indirectly.

When should I use Sphinx?
• Site / Product / Document searches
• Auto-suggest / Auto-Correct functionality
• Finding relevant and related items

Simple Architecture
• Often, search is ofﬂoaded
straight to the database
• Search goes to the backend
which performs queries on the
database
• Obviously very easy to
implement

Simple Architecture
• Simple “starts with” searches
on indexed ﬁelds can
sometimes work: `city` LIKE
‘Las%’
• Anything else will lock your
database for writes with
MyISAM.
• MySQL is not a great or
ﬂexible full text engine
• It can sometimes be adequate

Sphinx Architecture
• Searchd is responsible for
receiving requests from
clients and executing the
searches against the sphinx
index.
• Indexer is responsible for
getting data into the sphinx
index.
• This separation allows
indexing and searching to be
scaled separately.

Sphinx Architecture
• Searchd has a binary protocol
for which there are several
clients available in multiple
languages.
• Searchd is also binary
compatible with MySQL’s
protocol since mysql 4.1
• Searchd is a daemon that
runs on your search servers

Sphinx Architecture
• Indexer is a shell program that
you can execute to build any
number of indexes.
• Can handle index rotation for
live indexing

Not So Quick Side Note
MySQL IS SLOWWWWWWWWWWWWW
(at text matches)

Still Not Quick Side Note
Indexes won’t help you…

Quicker Side Note
Full Text Search isn’t so bad
IF….

Sphinx Concepts
• Sphinx Indexes “Documents”
• Each document has a unique unsigned, non-
zero integer ID (either 32 bit or 64 bit space)
• Each document has one or more ﬁelds
• Each document has zero or more attributes

Indexes / Sources
• Sphinx indexes are created from one or more
sources.
• The source can be a database, xml, or tsv
stream.
• You can use multiple sources
• This is useful for maintaining updated indexes
• Also used to implement a sphinx cluster

Sphinx Fields
• Fields are what the full text index is comprised of.
• When searching you can search against any number
of fields.
• You can assign different relevancy weights to different
fields.
• The original value of a field is never stored by Sphinx.
• You should always have at least one.

Sphinx Attributes
• data that helps further describe the item being
indexed
• Can be returned as a part of the search
• Useful for ﬁltering and sorting results
• These are not a part of the full text index.

MySQL Full Text Search
• You can get away with MyISAM tables or as of
version 5.6 InnoDB.
• You don’t care about morphology (think plurals)
• You don’t need anything but the most basic of
search operators

Creating An Index
• We are going to add an index that sources a
mysql database.
• The data being sourced is a list of the titles of
wikipedia posts.

Indexer Configuration
• We are going to be peaking into a sphinx
configuration file now.
• You can rebuild the config file by concatenating
each section into a single file.
• On my VM this file is located in /usr/local/etc/
sphinx.conf

Source Deﬁnition
Deﬁnes the connection information

Connection information
• Ideally, you should create a
separate account for sphinx
• You can also connect via unix
socket
• I didn’t specify it here, but you
can also add a port.

Source Deﬁnition
The query that pulls data to populate the index

Source Index
• The index query MUST return
the id ﬁeld as the ﬁrst column
• Remember, the id needs to be
a unique, unsigned 64 bit (or
less number)
• The query must be on a single
line. Unless you escape new
lines with back slashes.
• Notice that we converted the
timestamp into a unix
timestamp. That is important.

Source Deﬁnition
How data is stored in the index

Source Fields
• The first column in the query is
always the ID.
• You specify any columns that
are attributes.
• Remember, attributes are
stored in the index as fields
that can be used to filter and
sort by.
• Any field besides the id that is
not specified as an attribute, is
assumed to be a text field (title)

Index Definition
• An Index includes one or
more sources.
• Each source gets it’s own
“source” line
• Multiple sources must all
define the same fields and
attributes.
• The ids need to be unique
across resources

Index Deﬁnition
• path is not actually a path, it’s
a ﬁlename with no extension.
• docinfo dictates if attributes
are stored in the index or
outside of the index.
• dict is not really important
now. Used to be either crc or
keywords. Now crc is
deprecated.
• min_word_len is the minimum
length of words to index

Rest of the Index Conﬁguration

It’s time to build the index
indexer <index name>

Searching the Index
• searchd is the daemon that searches the index
• Binary Protocol 
 
OR
• MySQL Compatible too!

searchd config
Included in the same config file as the rest

MySQL Compatible
• Tables == Indexes
• SHOW TABLES…Shows indexes.
• Select * From <index> works too.

Querying Indexes
• Default limit of 20 rows
• Notice the text ﬁelds are not
returned…
• They would be if we made
them attributes
(sql_ﬁeld_string)

Querying Indexes
• The magic function in
SphinxQL is match()
• match() performs a full text
search against the entire
index…usually
• The ‘@ﬁeld’ operator can
isolate which ﬁeld is searched
on.

Querying Indexes
• You can query against
attributes
• You can sort results
• You can use the weight()
function to determine
relevancy.

Querying Indexes
• The 25387283 title was more
relevant because it matched
on the term “testing”

Getting PHP into the mix
• All we need? PDO.
• We will build a basic search page
• Accepts a query, displays up to 100 matching
results by relevancy with the matching keywords
highlighted.

Adding the fancy yellow highlighting

Cool things we would talk about
if I had like…3 more hours
• Auto-suggest, Auto-correct
• More on lemmatization and stemming
• Distributed Sphinx Clustering
• Delta indexes
• Real Time Indexes
• The plethora of operators you can use
• Ranged Queries
• ………

Additional Information
• The sphinx documentation is actually pretty
great
• http://sphinxsearch.com/docs/
• Slides are already on Slideshare
• Will link them to the meet up shortly

Using Sphinx for Search in PHP

More Related Content

Viewers also liked

Similar to Using Sphinx for Search in PHP

Recently uploaded

Using Sphinx for Search in PHP