0
Comparing open source
search engines
Richard Boulton
@rboulton
richard@cnav.co.uk
Search Engine?
Document oriented database
Inverted index
Ranking / weighting algorithm
Lucene
Java
Apache License
Low-level: Java API
Lucene Family
Solr: “REST-like” XML/JSON API
ElasticSearch: REST API
… and many commercial engines
Xapian
C++
GPLv2
Low-level: C++ API
Python/Ruby/PHP/Perl/Java bindings
Xapian
C++
GPLv2
Low-level: C++ API
Python/Ruby/PHP/Perl/Java bindings
Partiality
Risk
Xapian Family
Omega: Indexer + CGI interface
Flax: REST API
Xappy: Python wrapper
Sphinx
C++
GPLv2
SQL-like API
Others
Riak Search
Terrier
MySQL Fulltext
PostgreSQL FTS
Redis
Whoosh
Logos
Document model
Lucene, Xapian:
List of terms
Solr, Sphinx:
Fields in a predefined fixed schema.
Flax, Xappy:
Fields, with ...
Updates
Lucene, Xapian + families:
Dynamic updates
Use batches for fastest updates
Sphinx:
No updates to existing indexes
...
Data structures
Lucene:
Hash based segments
Heirarchical merge
Xapian:
B-tree, transactional
Scaling / replication
● All engines allow searches across databases
● Allows sharding
● All engines allow replication
● Al...
Commercial Support
Lucene: Lucid Imagination, Sematext, …
Xapian: Oligarchy Ltd, Flax, me
Sphinx: Sphinx Technologies Inc
● Lucene / Solr community – revolting (they say)
● Xapian – quieter, but steadily growing
● Sphinx – popular amongst relat...
Upcoming SlideShare
Loading in...5
×

Comparing open source search engines

10,143

Published on

Lightning talk given at Cambridge Geek Night 6

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,143
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
133
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Transcript of "Comparing open source search engines"

  1. 1. Comparing open source search engines Richard Boulton @rboulton richard@cnav.co.uk
  2. 2. Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
  3. 3. Lucene Java Apache License Low-level: Java API
  4. 4. Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
  5. 5. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
  6. 6. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
  7. 7. Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
  8. 8. Sphinx C++ GPLv2 SQL-like API
  9. 9. Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
  10. 10. Logos
  11. 11. Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
  12. 12. Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
  13. 13. Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
  14. 14. Scaling / replication ● All engines allow searches across databases ● Allows sharding ● All engines allow replication ● Allows spreading load and high availability ● Had difficulty with Sphinx ● Elastic search does it completely transparently
  15. 15. Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
  16. 16. ● Lucene / Solr community – revolting (they say) ● Xapian – quieter, but steadily growing ● Sphinx – popular amongst relational database users (apparently) Community
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×