Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Comparing open source search engines

on

  • 8,982 views

Lightning talk given at Cambridge Geek Night 6

Lightning talk given at Cambridge Geek Night 6

Statistics

Views

Total Views
8,982
Views on SlideShare
8,963
Embed Views
19

Actions

Likes
5
Downloads
113
Comments
0

4 Embeds 19

http://lanyrd.com 15
http://paper.li 2
https://lanyrd.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Comparing open source search engines Comparing open source search engines Presentation Transcript

    • Comparing open source search engines Richard Boulton @rboulton [email_address]
    • Search Engine? Document oriented database Inverted index Ranking / weighting algorithm
    • Lucene Java Apache License Low-level: Java API
    • Lucene Family Solr: “REST-like” XML/JSON API ElasticSearch: REST API … and many commercial engines
    • Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings
    • Xapian C++ GPLv2 Low-level: C++ API Python/Ruby/PHP/Perl/Java bindings Partiality Risk
    • Xapian Family Omega: Indexer + CGI interface Flax: REST API Xappy: Python wrapper
    • Sphinx C++ GPLv2 SQL-like API
    • Others Riak Search Terrier MySQL Fulltext PostgreSQL FTS Redis Whoosh
    • Logos
    • Document model Lucene, Xapian: List of terms Solr, Sphinx: Fields in a predefined fixed schema. Flax, Xappy: Fields, with associated modifiable schema. ElasticSearch: Fields, document types, free schema.
    • Updates Lucene, Xapian + families: Dynamic updates Use batches for fastest updates Sphinx: No updates to existing indexes (“Realtime indexing” in beta with SQL API)
    • Data structures Lucene: Hash based segments Heirarchical merge Xapian: B-tree, transactional
    • Scaling / replication
      • All engines allow searches across databases
        • Allows sharding
      • All engines allow replication
        • Allows spreading load and high availability
        • Had difficulty with Sphinx
        • Elastic search does it completely transparently
    • Commercial Support Lucene: Lucid Imagination, Sematext, … Xapian: Oligarchy Ltd, Flax, me Sphinx: Sphinx Technologies Inc
      • Lucene / Solr community – revolting (they say)
      • Xapian – quieter, but steadily growing
      • Sphinx – popular amongst relational database users (apparently)
      Community