Your SlideShare is downloading. ×
Comparing open source search engines
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Comparing open source search engines

9,339
views

Published on

Lightning talk given at Cambridge Geek Night 6

Lightning talk given at Cambridge Geek Night 6

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,339
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
123
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Comparing open source search engines Richard Boulton @rboulton richard@cnav.co.uk
  • 2. Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
  • 3. Lucene Java Apache License Low-level: Java API
  • 4. Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
  • 5. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
  • 6. Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
  • 7. Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
  • 8. Sphinx C++ GPLv2 SQL-like API
  • 9. Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
  • 10. Logos
  • 11. Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
  • 12. Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
  • 13. Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
  • 14. Scaling / replication ● All engines allow searches across databases ● Allows sharding ● All engines allow replication ● Allows spreading load and high availability ● Had difficulty with Sphinx ● Elastic search does it completely transparently
  • 15. Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
  • 16. ● Lucene / Solr community – revolting (they say) ● Xapian – quieter, but steadily growing ● Sphinx – popular amongst relational database users (apparently) Community