Your SlideShare is downloading. ×
Comparing open source search engines
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Comparing open source search engines

9,684

Published on

Lightning talk given at Cambridge Geek Night 6

Lightning talk given at Cambridge Geek Night 6

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,684
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
127
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Comparing open source search engines Richard Boulton @rboulton richard@cnav.co.uk
  • 2. Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
  • 3. Lucene Java Apache License Low-level: Java API
  • 4. Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
  • 5. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
  • 6. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
  • 7. Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
  • 8. Sphinx C++ GPLv2 SQL-like API
  • 9. Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
  • 10. Logos
  • 11. Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
  • 12. Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
  • 13. Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
  • 14. Scaling / replication ● All engines allow searches across databases ● Allows sharding ● All engines allow replication ● Allows spreading load and high availability ● Had difficulty with Sphinx ● Elastic search does it completely transparently
  • 15. Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
  • 16. ● Lucene / Solr community – revolting (they say) ● Xapian – quieter, but steadily growing ● Sphinx – popular amongst relational database users (apparently) Community

×