Elasticsearch and Apache Solr are both distributed search engines that provide full text search capabilities and real-time analytics on large volumes of data. The document compares their architectures, data models, query languages, and other features. Key differences include Elasticsearch having a more dynamic schema while Solr relies more on predefined schemas, and Elasticsearch natively supports features like nested objects and parent/child relationships that require additional configuration in Solr.
Battle of the giants: Apache Solr vs ElasticSearch
1. Battle of the Giants
Apache Solr 4.0 vs ElasticSearch 0.20
Rafaล Kuฤ โ Sematext International
@kucrafal @sematext sematext.com
2. Who Am I
โข โSolr 3.1 Cookbookโ author (4.0 inc)
โข Sematext consultant & engineer
โข Solr.pl co-founder
โข Father and husband ๏
Copyright 2012 Sematext Intโl. All rights reserved
3. What Will I Talk About ?
Copyright 2012 Sematext Intโl. All rights reserved
4. Under the Hood
โข ElasticSearch 0.20
โ Apache Lucene 3.6.1
โข Apache Solr 4.0
โ Apache Lucene 4.0
Copyright 2012 Sematext Intโl. All rights reserved
5. Architecture
โข What we expect
โ Scalability
โ Fault toleranance
โ High availablity
โ Features
โข What we are also looking for
โ Manageability
โ Installation ease
โ Tools
Copyright 2012 Sematext Intโl. All rights reserved
6. ElasticSearch Cluster Architecture
โข Distributed
โข Fault tolerant
โข Only ElasticSearch nodes
โข Single leader
โข Automatic leader election
Copyright 2012 Sematext Intโl. All rights reserved
7. SolrCloud Cluster Architecture
โข Distributed
โข Fault tolerant
โข Apache Solr + ZooKeeper ensemble
โข Leader per shard
โข Automatic leader election
Copyright 2012 Sematext Intโl. All rights reserved
8. Collection vs Index
โข Collection โ Solr main logical index
โข Index โ ElasticSearch main logic structure
โข Collections and Indices can be spread among
different nodes in the cluster
Copyright 2012 Sematext Intโl. All rights reserved
9. Multiple Document Types in Index
โข ElasticSearch - multiple document types in a
single index
โข Apache Solr - multiple document types in a
single collection โ shared schema.xml
Copyright 2012 Sematext Intโl. All rights reserved
10. Shards and Replicas
โข Index / Collection can have many shards
โข Each shard can have 0 or more replicas
โข Replicas are automatically updated
โข Replicas can be promoted to leaders when a
leader shard goes off-line
Copyright 2012 Sematext Intโl. All rights reserved
11. Index and Query Routing
โข Control where documents are going
โข Control where queries are going
โข Manual data distribution
Copyright 2012 Sematext Intโl. All rights reserved
12. Querying Without Routing
Shard 1 Shard 2 Shard 3 Shard 4
Shard 5 Shard 6 Shard 7 Shard 8
Collection / Index
Application
Copyright 2012 Sematext Intโl. All rights reserved
13. Query With Routing
Shard 1 Shard 2 Shard 3 Shard 4
Shard 5 Shard 6 Shard 7 Shard 8
Collection / Index
Application
Copyright 2012 Sematext Intโl. All rights reserved
14. Routing Docs and Queries in Solr
โข Requires some effort
โข Defaults to hash based on document
identifiers
โข Can be turned off using
solr.NoOpDistributingUpdateProcessorFactory
<updateRequestProcessorChain>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
<processor class="solr.NoOpDistributingUpdateProcessorFactory" />
</updateRequestProcessorChain>
Copyright 2012 Sematext Intโl. All rights reserved
15. Routing Docs and Queries - ElasticSearch
โข routing parameter controls target shard
which document/query will be forwarded to
โข defaults to document identifiers
โข can be changed to any value
curl -XPUT localhost:9200/sematext/test/1?routing=1234 -d '{
"title" : "Test routing document"
}'
curl โXGET localhost:9200/sematext/test/_search/?q=*&routing=1234
Copyright 2012 Sematext Intโl. All rights reserved
16. Apache Solr Index Structure
โข Field types defined in schema.xml file
โข Fields defined in schema.xml file
โข Allows automatic value copying
โข Allows dynamic fields
โข Allows custom similarity definition
Copyright 2012 Sematext Intโl. All rights reserved
17. ElasticSearch Index Structure
โข Schema - less
โข Analyzers and filters defined with HTTP API
โข Fields defined with an HTTP request
โข Multi โ field support
โข Allows nested documents
โข Allows parent โ child relationship
โข Allows structured data
Copyright 2012 Sematext Intโl. All rights reserved
18. Index Structure Manipulation
โข Possible to some extent in Solr as well as
ElasticSearch
โข ElasticSearch allows dynamic mappings
update (not always)
Copyright 2012 Sematext Intโl. All rights reserved
19. Aliasing
โข Solr
โ Allows core aliasing
โข ElasticSearch
โ Allows index aliasing
โ We can add filter to alias
โ We can add index routing
โ We can add search routing
Copyright 2012 Sematext Intโl. All rights reserved
20. Server Configuration
โข Solr โข ElasticSearch
โ Static in solrconfig.xml โ Static in elasticsearch.yml
โ Can be reloaded โ Properties can be
during runtime with changed during runtime
collection/core reload (although not all) without
reloading
Copyright 2012 Sematext Intโl. All rights reserved
21. ElasticSearch Gateway Module
โข Your data time machine
โข Stores indices and meta data
โข Currently available:
โ Local
โ Shared FS
โ Hadoop
โ S3
Copyright 2012 Sematext Intโl. All rights reserved
22. Discovery
โข Apache Solr uses ZooKeeper
โข ElasticSearch uses Zen Discovery
Copyright 2012 Sematext Intโl. All rights reserved
23. ElasticSearch Zen Discovery
โข Allows automatic node discovery
โข Provides multicast and unicast discovery
methods
โข Automatic master detection
โข Two - way failure detection
Copyright 2012 Sematext Intโl. All rights reserved
24. Apache Solr & Apache ZooKeeper
โข Requires additional software
โข ZooKeeper ensemble with 1+ ZooKeeper
instances
โข Prevents split โ brain situations
โข Holds collections configurations
โข Solr needs to know address of one of the
ZooKeeper instances
Copyright 2012 Sematext Intโl. All rights reserved
25. API
โข HTTP REST API in ElasticSearch or Query String
for simple queries
โข HTTP with Query String in Apache Solr
โข Both provide specialized Java API
โ SolrJ for Apache Solr and CloudSolrServer
โ ElasticSearch with TransportClient for remote
connections
Copyright 2012 Sematext Intโl. All rights reserved
26. Apache Solr and Query String
โข Queries are built of request parameters
โข Some degree of structuring allowed (local
params)
curl 'http://localhost:8983/solr/select?q=text:weird&sort=date+desc'
Copyright 2012 Sematext Intโl. All rights reserved
27. ElasticSearch REST End-Points
โข Simple queries built of request parameters
โข Stuctured queries built as JSON objects
curl โXGET
'localhost:9200/sematext/test/_search/?q=_all:weird&sort=date:desc'
curl -XGET 'localhost:9200/sematext/test_search' -d '{
"query" : {
"term" : {
"_all" : "weird"
},
"sort" : {
"date" : {
"order" : "desc"
}
}
}'
Copyright 2012 Sematext Intโl. All rights reserved
28. Data Handling
โข Solr
โ Multiple formats allowed as input
โ Can return results in multiple formats
โข ElasticSearch
โ JSON in / JSON out
Copyright 2012 Sematext Intโl. All rights reserved
29. Single or Batch
โข Solr โข ElasticSearch
โ Single or multiple โ Single document with a
documents per standard indexing call
request โ _bulk end โ point exposed
for batch indexing
โ _bulk UDP end โ point can
be exposed for low
latency batch indexing
Copyright 2012 Sematext Intโl. All rights reserved
30. Partial Document Updates
โข Not based on LUCENE-3837 proposed by
Andrzej Biaลecki
โข Document reindexing on the side of search
server
โข Both servers use versioning to prevent
changes being overwritten
โข Can lead to decreased network traffic in some
cases
Copyright 2012 Sematext Intโl. All rights reserved
31. ElasticSearch Partial Doc Update
โข Special end โ point exposed - _update
โข Supports parameters like
routing, parent, replication, percolate, etc
(similar to Index API)
โข Uses scripts to perform document updates
curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{
"script" : "ctx._source.enabled = enabled",
"params" : {
"enabled" : true
}
}'
Copyright 2012 Sematext Intโl. All rights reserved
32. Apache Solr Partial Doc Update
โข Sent to the standard update handler
โข Requires _version_ field to be present
curl 'localhost:8983/solr/update?commit=true' -H 'Content-
type:application/json' -d '[
{
"id" : "12345",
"enabled" : {
"set" : true
}
}
]'
Copyright 2012 Sematext Intโl. All rights reserved
33. Solr Collections API
โข Built on top of Core Admin
โข Allows:
โ Collection creation
โ Collection reload
โ Collection deletion
Copyright 2012 Sematext Intโl. All rights reserved
34. ElasticSearch Indices REST API
โข Allows:
โ Index creation
โ Index deletion
โ Index closing and opening
โ Index refreshing
โ Existence checking
Copyright 2012 Sematext Intโl. All rights reserved
35. Analysis Chain Definition
โข Solr โข ElasticSearch
โ Static in schema.xml โ Static in elasticsearch.yml
โ Can be reloaded โ Defined during index/type
during runtime with creation with REST call
collection/core reload โ Possible to change with
update mapping call (not
all changes allowed)
Copyright 2012 Sematext Intโl. All rights reserved
36. Multilingual Data Handling
โข Both ElasticSearch and Apache Solr built on
top of Apache Lucene
โข Solr โ analyzers defined per field in schema.xml
file
โข ElasticSearch โ analyzer defined in
mappings, but can be set during query or
specified on the basis of field values
Copyright 2012 Sematext Intโl. All rights reserved
37. Results Grouping
โข Available in Apache Solr only
โข Allows for results grouping based on:
โ Field value
โ Query
โ Function query (not available during distributed
searching)
Copyright 2012 Sematext Intโl. All rights reserved
38. Prospective Search
โข Allows for checking if a document matches a
stored query
โข Not available in Apache Solr
โข Available in ElasticSearch under the name of
Percolator
Copyright 2012 Sematext Intโl. All rights reserved
39. Spellchecker
โข Allows to check and correct spelling mistakes
โข Not available in ElasticSearch currently
โข Multiple implementations available in Apache
Solr
โ IndexBasedSpellChecker
โ WordBreakSolrSpellChecker
โ DirectSolrSpellChecker
Copyright 2012 Sematext Intโl. All rights reserved
40. Full Text Search Capabilities
โข Variety of queries
โข Ability to control score calculation
โข Different query parsers available
โข Advanced Lucene queries (like SpanQueries)
exposed
Copyright 2012 Sematext Intโl. All rights reserved
41. Score Calculation
โข Leverage Lucene scoring capabilities
โข Control over document importance
โข Control over query importance
โข Control over term and phrase importance
Copyright 2012 Sematext Intโl. All rights reserved
42. Apache Solr and Score Influence
โข Index time
โ Document boosts
โ Field boosts
โข Query time
โ Term boosts
โ Field boosts
โ Phrases boost
โ Function queries
Copyright 2012 Sematext Intโl. All rights reserved
43. ElasticSearch and Score Influence
โข Index time
โ Document and field boosts
โข Query time
โ Different queries provide different boost controls
โ Can calculate distributed term frequencies
โ Negative and Positive boosting queries
โ Custom score filters
โข Scripts
โ Control scoring with scripts
Copyright 2012 Sematext Intโl. All rights reserved
44. Nested Objects
โข Possible only in ElasticSearch
โข Indexed as separate documents
โข Stored in the same part of the index as the
root document
โข Hidden from standard queries and filters
โข Need appropriate queries and filters (nested)
Copyright 2012 Sematext Intโl. All rights reserved
45. More Like This
โข Lets us find similar documents
โข Solr
โ More Like This Component
โข ElasticSearch
โ More Like This Query
โ More Like This Field Query
โ _mlt REST end โ point
Copyright 2012 Sematext Intโl. All rights reserved
46. Solr Parent โ Child Relationship
โข Used at query time
โข Multi core joins possible
http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow
Copyright 2012 Sematext Intโl. All rights reserved
47. ElasticSearch Parent โ Child Handling
โข Proper indexing required
โข Indexed as separate documents
โข Standard queries donโt return child
documents
โข In order to retrieve parent docs one should
use appropriate queries and filters
(has_child, has_parent, top_children)
Copyright 2012 Sematext Intโl. All rights reserved
48. Filters
โข Used to narrown down query results
โข Good candidates for caching and reuse
โข Supported by ElasticSearch and Apache Solr
โข Should be used for repeatable query elements
Copyright 2012 Sematext Intโl. All rights reserved
49. Apache Solr Filter Queries
โข Multiple filters per query
โข Filters are addictive
โข Different query parsers can be used
โข Local params can be used
โข Narrow down faceting results
Copyright 2012 Sematext Intโl. All rights reserved
50. ElasticSearch Filtered Queries
โข Can be defined using queries exposed by the
Query DSL
โข Can be used for custom score calculation
(i.e., custom filters score query)
โข Doesnโt narrow down faceting results by
default (facets have their own filters)
Copyright 2012 Sematext Intโl. All rights reserved
51. Filter Cache Control
โข Both Solr and ElasticSearch let us control
cache for filters
โข Solr
โ Using local params and cache property
โข ElasticSearch
โ _cache property
โ _cache_key property
Copyright 2012 Sematext Intโl. All rights reserved
52. Faceting
โข Both provide common facets
โ Terms
โ Range & query
โ Terms statistics
โ Spatial distance
โข Solr
โ Pivot faceting
โข ElasticSearch
โ Histograms
Copyright 2012 Sematext Intโl. All rights reserved
53. Real Time Or Not ?
โข Allow getting document not yet indexed
โข Donโt need searcher reopening
โข ElasticSearch
โ Separate Get and Multi Get APIโs
โข Apache Solr
โ Separate Realtime Get Handler
โ Can be used as a search component
Copyright 2012 Sematext Intโl. All rights reserved
54. Caches and Warming
โข ElasticSearch and Solr allow caching
โข Both allow running warming queries
โข ElasticSearch by default doesnโt limit cache
sizes
Copyright 2012 Sematext Intโl. All rights reserved
55. Solr Caches
โข Types
โ Filter Cache
โ Query Result Cache
โ Document Cache
โข Implementation choices
โ LRUCache
โ FastLRUCache
โ LFUCache
โข Other configuration options:
โ Size
โ Maximum size
โ Autowarming count
Copyright 2012 Sematext Intโl. All rights reserved
56. ElasticSearch Caches
โข Types
โ Filter Cache
โ Field Data Cache
โข Implementation choices
โ Resident
โ Soft
โ Weak
โข Other configuration options:
โ Max size (entries per segment)
โ Expiration time
Copyright 2012 Sematext Intโl. All rights reserved
57. Cluster State Monitoring
โข Apache Solr โ multiple mbeans exposed by
JMX
โข ElasticSearch โ multiple REST end โ points
exposed to get different statistics
Copyright 2012 Sematext Intโl. All rights reserved
58. ElasticSearch Statistics API
โข Health and State Check
โข Nodes Information and Statistics
โข Cache Statistics
โข Index Segments Information
โข Index Information and Statistics
โข Mappings Information
Copyright 2012 Sematext Intโl. All rights reserved
61. Cluster Settings Update
โข ElasticSearch lets us:
โ Control rebalancing
โ Control recovery
โ Control allocation
โ Change the above on the live cluster
Copyright 2012 Sematext Intโl. All rights reserved
62. Custom Shard Allocation
โข Possible in ElasticSearch
โข Cluster level:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"cluster.routing.allocation.exclude._ip" : "192.168.2.1"
}
}'
โข Index level:
curl -XPUT localhost:9200/sematext/ -d '{
"index.routing.allocation.include.tag" : "nodeOne,nodeTwo"
}'
Copyright 2012 Sematext Intโl. All rights reserved
63. Moving Shards and Replicas
โข Possible in ElasticSearch, not available in Solr
โข Allows to move shards and replicas to any
node in the cluster on demand
โข Available in ElasticSearch:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [
{"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}},
{"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}
]
}'
Copyright 2012 Sematext Intโl. All rights reserved
64. And The Winner Is ?
Copyright 2012 Sematext Intโl. All rights reserved
65. How to Reach Us
โข Rafaล Kuฤ
โ Twitter: @kucrafal
โ E-mail: rafal.kuc@sematext.com
โข Sematext
โ Twitter: @sematext
โ Website: http://sematext.com
โข Solr vs ElasticSearch series:
โข http://blog.sematext.com/2012/08/23/solr-vs-
elasticsearch-part-1-overview/
Copyright 2012 Sematext Intโl. All rights reserved
66. We Are Hiring !
โข Dig Search ?
โข Dig Analytics ?
โข Dig Big Data ?
โข Dig Performance ?
โข Dig working with and in open โ source ?
โข Weโre hiring world โ wide !
http://sematext.com/about/jobs.html
Copyright 2012 Sematext Intโl. All rights reserved