- Welcome - Beer and pizza sponsored by Brightbox - Please consider talking!
-- So why not just use the DB?
- building SQL in code, easy to introduce mistakes - in with image - Someone has already handled the hard stuff, stop-word removal, stemming, that sort of thing - DB’s are traditionally the hardest element of a stack to scale, lets not put more stuff there. One of the main points. - Luckily there are a bunch of alternatives, next slide
…and sphinx is one - Standalone - runs as a separate process - written in c, small memory footprint - stable - high indexing speed (upto 10 MB/sec on modern CPUs) - high search speed (avg query is under 0.1 sec on 2-4 GB text collections) - high scalability (upto 100 GB of text, upto 100 M documents on a single CPU) - supports distributed searching (since v.0.9.6)
- Most importantly… read - Don’t let anyone try to convince you otherwise with shady propaganda
Installing sphinx is easy
- indexer, builds indexes - searchd, where the magic happens
- Not much use to us unless we can use it with our applications, we have two choices - Both widely used at EY - differences
All you need is…
>> query = \"The cat sat on the mat\"
=> \"The cat sat on the mat\"
>> where = \"(email like '%#{ query.split(/\\s+/).map{|term| term.downcase }.join(\"%') OR (email
like '%\") }')\"
=> \"(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%')
OR (email like '%the%') OR (email like '%mat')\"
>>execute(“select * from users where #{where}”)
=> fail
PHP
Congratulations, you are now a l33t programmer!
^
Job done!
All you need is…
>> query = \"The cat sat on the mat\"
=> \"The cat sat on the mat\"
>> where = \"(email like '%#{ query.split(/\\s+/).map{|term| term.downcase }.join(\"%') OR (email
like '%\") }')\"
=> \"(email like '%the%') OR (email like '%cat%') OR (email like '%sat%') OR (email like '%on%')
OR (email like '%the%') OR (email like '%mat')\"
>>execute(“select * from users where #{where}”)
=> fail
PHP
Congratulations, you are now a l33t programmer!
^
Job done!
Why not use the DB?
• Building up SQL queries in
code sucks
• Full text indexing in DBs isn’t
great either
• DB’s are hard to scale
Why not use the DB?
• Building up SQL queries in
code sucks
• Full text indexing in DBs isn’t
great either
• DB’s are hard to scale
Sphinx is…
• Sphinx is a full-text search engine
• Open source (GPL version 2)
• Standalone
• Proven stable
• Performs well
Sphinx
Much better than Solr and Ferret
Sphinx
Much better than Solr and Ferret
Maybe
Installing sphinx
sudo port install sphinx
Out of the box
• indexer - utility which creates fulltext indexes
• searchd - daemon which enables external software to search fulltext
indexes
Amongst other things
Using with your app
Two Ruby on Rails APIs
• Ultra Sphinx
• Thinking Sphinx
Installing Ultra Sphinx
cd rails_app && script/plugin install git://github.com/
freelancing-god/thinking-sphinx.git
0 comments
Post a comment