Comparing open source
search engines
Richard Boulton
@rboulton
richard@cnav.co.uk
Search Engine?
Document oriented database
Inverted index
Ranking / weighting algorithm
Lucene
Java
Apache License
Low-level: Java API
Lucene Family
Solr: “REST-like” XML/JSON API
ElasticSearch: REST API
… and many commercial engines
Xapian
C++
GPLv2
Low-level: C++ API
Python/Ruby/PHP/Perl/Java bindings
Xapian
C++
GPLv2
Low-level: C++ API
Python/Ruby/PHP/Perl/Java bindings
Partiality
Risk
Xapian Family
Omega: Indexer + CGI interface
Flax: REST API
Xappy: Python wrapper
Sphinx
C++
GPLv2
SQL-like API
Others
Riak Search
Terrier
MySQL Fulltext
PostgreSQL FTS
Redis
Whoosh
Logos
Document model
Lucene, Xapian:
List of terms
Solr, Sphinx:
Fields in a predefined fixed schema.
Flax, Xappy:
Fields, with ...
Updates
Lucene, Xapian + families:
Dynamic updates
Use batches for fastest updates
Sphinx:
No updates to existing indexes
...
Data structures
Lucene:
Hash based segments
Heirarchical merge
Xapian:
B-tree, transactional
Scaling / replication
● All engines allow searches across databases
● Allows sharding
● All engines allow replication
● Al...
Commercial Support
Lucene: Lucid Imagination, Sematext, …
Xapian: Oligarchy Ltd, Flax, me
Sphinx: Sphinx Technologies Inc
● Lucene / Solr community – revolting (they say)
● Xapian – quieter, but steadily growing
● Sphinx – popular amongst relat...
Upcoming SlideShare
Loading in...5
×

Comparing open source search engines

10,268

Published on

Lightning talk given at Cambridge Geek Night 6

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,268
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
134
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Comparing open source search engines

  1. 1. Comparing open source search engines Richard Boulton @rboulton richard@cnav.co.uk
  2. 2. Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
  3. 3. Lucene Java Apache License Low-level: Java API
  4. 4. Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
  5. 5. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
  6. 6. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
  7. 7. Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
  8. 8. Sphinx C++ GPLv2 SQL-like API
  9. 9. Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
  10. 10. Logos
  11. 11. Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
  12. 12. Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
  13. 13. Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
  14. 14. Scaling / replication ● All engines allow searches across databases ● Allows sharding ● All engines allow replication ● Allows spreading load and high availability ● Had difficulty with Sphinx ● Elastic search does it completely transparently
  15. 15. Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
  16. 16. ● Lucene / Solr community – revolting (they say) ● Xapian – quieter, but steadily growing ● Sphinx – popular amongst relational database users (apparently) Community
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×