• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Comparing open source search engines

  • 8,930 views
Uploaded on

Lightning talk given at Cambridge Geek Night 6

Lightning talk given at Cambridge Geek Night 6

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
8,930
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
119
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Comparing open source search engines Richard Boulton @rboulton [email_address]
  • 2. Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
  • 3. Lucene Java Apache License Low-level: Java API
  • 4. Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
  • 5. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
  • 6. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
  • 7. Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
  • 8. Sphinx C++ GPLv2 SQL-like API
  • 9. Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
  • 10. Logos
  • 11. Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
  • 12. Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
  • 13. Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
  • 14. Scaling / replication
    • All engines allow searches across databases
      • Allows sharding
    • All engines allow replication
      • Allows spreading load and high availability
      • 15. Had difficulty with Sphinx
      • 16. Elastic search does it completely transparently
  • 17. Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
  • 18.
    • Lucene / Solr community – revolting (they say)
    • 19. Xapian – quieter, but steadily growing
    • 20. Sphinx – popular amongst relational database users (apparently)
    Community