Elastic search apache_solr


Published on

Published in: Technology

Elastic search apache_solr

  1. 1. ElasticSearch vs. Apache Solr last update Dec 2011 An experiment to compare  ElasticSearch with Apache Solr From Peter.Karich@pannous.info
  2. 2. Preface <ul><li>Comments and suggestions should go to my blog   </li></ul><ul><li>  </li></ul><ul><ul><li>This document is based on my personal oppinion and experience with Jetslide  where I moved from Solr and several other projects based on ElasticSearch </li></ul></ul><ul><ul><li>This document should not be used to show that one of the projects is 'bad'. Keep in mind that both projects are rapidly evolving and you should always make your own tests if a (software) product fits to your use case / company. </li></ul></ul>
  3. 3. Similarities of ElasticSearch (ES) and Solr <ul><li>Both software systems are: </li></ul><ul><ul><li>Powerful search servers based on Lucene 3.5: </li></ul></ul><ul><ul><ul><li>ElasticSearch 0.18.5 </li></ul></ul></ul><ul><ul><ul><li>Apache Solr 3.5 </li></ul></ul></ul><ul><ul><li>Both are free software and stand under Apache License 2 </li></ul></ul><ul><li>Both systems can send JSON over HTTP for indexing and querying. Both support advanced Lucene queries and more. </li></ul>
  4. 4. What ES offers and Solr not (so well) <ul><ul><li>ES is distributed         </li></ul></ul><ul><ul><ul><li>easy sharding ('splits' one index into smaller) </li></ul></ul></ul><ul><ul><ul><li>easy replication ('copies' one index to multiple nodes) </li></ul></ul></ul><ul><ul><ul><li>adding / removing a node is simple; indices will move accordingly </li></ul></ul></ul><ul><ul><ul><li>easy cloud support for amazon (S3, discovery via API) </li></ul></ul></ul><ul><ul><ul><li>support for GigaSpaces, Coherence and Terracotta </li></ul></ul></ul><ul><ul><li>Some Solr features are not supported in 'distributed mode' and there are several methods (no one is really easy) to do it </li></ul></ul><ul><ul><li>ES is realtime and distributed : just specify latency via API </li></ul></ul>
  5. 5. ... what ElasticSearch offers (2) <ul><ul><li>The full JSON doc can be nested and stored within the index in one field called _source (even compressed).       Re-indexing simple </li></ul></ul><ul><ul><li>Query via JSON (or url). Use curl or ElasticSearch-Head </li></ul></ul><ul><ul><li>When indexing you specify: </li></ul></ul><ul><ul><ul><li>the index </li></ul></ul></ul><ul><ul><ul><li>the id </li></ul></ul></ul><ul><ul><ul><li>the type - this makes: </li></ul></ul></ul><ul><ul><li>multi-tenancy  easy. A simple API call to create/delete an index. Solr multi core is more complicated ... </li></ul></ul>
  6. 6. ElasticSearch - Head     (similar to Solr Admin page)
  7. 7. ... what ElasticSearch offers (3) <ul><ul><li>ES introduces concept of ' Gateway ' for long term persistency </li></ul></ul><ul><ul><ul><li>define index storage (in-memory, on filesystem ...) </li></ul></ul></ul><ul><ul><ul><li>when ES crashes it can recover the index storage from this gateway (the 'availability' system). Even if index storage was in-memory. </li></ul></ul></ul><ul><ul><ul><li>For amazon you can use S3 as gateway </li></ul></ul></ul><ul><ul><li>ES supports complete scripting while querying to do stats, facets etc. </li></ul></ul>
  8. 8. ES lacks ... <ul><ul><li>100% documentation and always working query examples </li></ul></ul><ul><ul><ul><li>solve this via freenode and ask! </li></ul></ul></ul><ul><ul><li>Only one main contributer. No commercial support </li></ul></ul><ul><ul><ul><li>But strong community (again: freenode + google groups) </li></ul></ul></ul><ul><ul><ul><li>But Shay is working full time on ES and ES has a strong community. E.g. community developed clients for several different languages </li></ul></ul></ul><ul><ul><ul><li>Shay has a strong background in 'search'. He wrote Compass </li></ul></ul></ul>
  9. 9. ES lacks ... <ul><ul><li>No Autowarming Queries (see issue 1006 ) </li></ul></ul><ul><ul><li>No XML support </li></ul></ul><ul><ul><li>ES has no facet pagination. See issue  1044 </li></ul></ul><ul><ul><li>ES has no field collapsing. See issue  256 </li></ul></ul><ul><ul><li>ES has no Date Math </li></ul></ul><ul><ul><li>ES has no separate smaller client jar for Java projects </li></ul></ul><ul><ul><li>ES has no spell checking plugin. See issue  911 </li></ul></ul>
  10. 10. When to use ElasticSearch <ul><li>You should use it when </li></ul><ul><ul><li>Your index is big or realtime or both </li></ul></ul><ul><ul><li>You have several indices or </li></ul></ul><ul><ul><li>A multi tenancy requirement. </li></ul></ul><ul><ul><li>You want to save administration effort and cost </li></ul></ul><ul><li>And when shouldn't you use ElasticSearch? </li></ul><ul><ul><li>If your company already uses Solr and no massive indexing is required, you have small indices or no realtime updates are required. </li></ul></ul><ul><ul><li>You company does not allow it due to riscs </li></ul></ul>
  11. 11. ElasticSearch Resources <ul><li>Documentation </li></ul><ul><ul><li>Search directly  elasticsearch.org </li></ul></ul><ul><ul><li>groups.google.com/a/elasticsearch.com/group/users </li></ul></ul><ul><ul><li>Freenode channel #elasticsearch ('group chat') </li></ul></ul><ul><ul><li>Twitter: twitter.com/elasticsearch </li></ul></ul><ul><ul><li>Code and Issues:  github.com/elasticsearch/elasticsearch </li></ul></ul><ul><li>Nice Utilities/Apps </li></ul><ul><ul><li>ElasticSearch-Head </li></ul></ul><ul><ul><li>ElasticSearch-BigDesk </li></ul></ul><ul><ul><li>Logstash - a log indexer built with ES and jruby:  code.google.com/p/logstash </li></ul></ul>
  12. 12. Solr Resources <ul><ul><li>A slide which points out how hard distributed Solr/Lucene is </li></ul></ul><ul><ul><li>Solr Wiki </li></ul></ul><ul><ul><li>Solr & DistributedSearch </li></ul></ul>