Comparing open source search engines

11,734 views
11,283 views

Published on

Lightning talk given at Cambridge Geek Night 6

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
11,734
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
140
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Comparing open source search engines

  1. 1. Comparing open source search engines Richard Boulton @rboulton richard@cnav.co.uk
  2. 2. Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
  3. 3. Lucene Java Apache License Low-level: Java API
  4. 4. Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
  5. 5. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
  6. 6. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
  7. 7. Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
  8. 8. Sphinx C++ GPLv2 SQL-like API
  9. 9. Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
  10. 10. Logos
  11. 11. Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
  12. 12. Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
  13. 13. Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
  14. 14. Scaling / replication ● All engines allow searches across databases ● Allows sharding ● All engines allow replication ● Allows spreading load and high availability ● Had difficulty with Sphinx ● Elastic search does it completely transparently
  15. 15. Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
  16. 16. ● Lucene / Solr community – revolting (they say) ● Xapian – quieter, but steadily growing ● Sphinx – popular amongst relational database users (apparently) Community

×