SlideShare a Scribd company logo
1 of 66
Battle of the Giants
Apache Solr 4.0 vs ElasticSearch 0.20

       Rafał Kuć – Sematext International
      @kucrafal @sematext sematext.com
Who Am I
•   „Solr 3.1 Cookbook” author (4.0 inc)
•   Sematext consultant & engineer
•   Solr.pl co-founder
•   Father and husband 




                 Copyright 2012 Sematext Int’l. All rights reserved
What Will I Talk About ?




     Copyright 2012 Sematext Int’l. All rights reserved
Under the Hood
• ElasticSearch 0.20
  – Apache Lucene 3.6.1
• Apache Solr 4.0
  – Apache Lucene 4.0




               Copyright 2012 Sematext Int’l. All rights reserved
Architecture
• What we expect
  – Scalability
  – Fault toleranance
  – High availablity
  – Features
• What we are also looking for
  – Manageability
  – Installation ease
  – Tools

                Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Cluster Architecture
•   Distributed
•   Fault tolerant
•   Only ElasticSearch nodes
•   Single leader
•   Automatic leader election




                Copyright 2012 Sematext Int’l. All rights reserved
SolrCloud Cluster Architecture
•   Distributed
•   Fault tolerant
•   Apache Solr + ZooKeeper ensemble
•   Leader per shard
•   Automatic leader election




               Copyright 2012 Sematext Int’l. All rights reserved
Collection vs Index
• Collection – Solr main logical index
• Index – ElasticSearch main logic structure
• Collections and Indices can be spread among
  different nodes in the cluster




              Copyright 2012 Sematext Int’l. All rights reserved
Multiple Document Types in Index
• ElasticSearch - multiple document types in a
  single index

• Apache Solr - multiple document types in a
  single collection – shared schema.xml




               Copyright 2012 Sematext Int’l. All rights reserved
Shards and Replicas
•   Index / Collection can have many shards
•   Each shard can have 0 or more replicas
•   Replicas are automatically updated
•   Replicas can be promoted to leaders when a
    leader shard goes off-line




                Copyright 2012 Sematext Int’l. All rights reserved
Index and Query Routing
• Control where documents are going
• Control where queries are going
• Manual data distribution




              Copyright 2012 Sematext Int’l. All rights reserved
Querying Without Routing

      Shard 1               Shard 2                 Shard 3               Shard 4



      Shard 5               Shard 6                 Shard 7               Shard 8


Collection / Index




                                       Application


                     Copyright 2012 Sematext Int’l. All rights reserved
Query With Routing

      Shard 1               Shard 2                 Shard 3               Shard 4



      Shard 5               Shard 6                 Shard 7               Shard 8


Collection / Index




                                       Application


                     Copyright 2012 Sematext Int’l. All rights reserved
Routing Docs and Queries in Solr
• Requires some effort
• Defaults to hash based on document
  identifiers
• Can be turned off using
 solr.NoOpDistributingUpdateProcessorFactory

  <updateRequestProcessorChain>
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
   <processor class="solr.NoOpDistributingUpdateProcessorFactory" />
  </updateRequestProcessorChain>


                      Copyright 2012 Sematext Int’l. All rights reserved
Routing Docs and Queries - ElasticSearch
• routing parameter controls target shard
  which document/query will be forwarded to
• defaults to document identifiers
• can be changed to any value
    curl -XPUT localhost:9200/sematext/test/1?routing=1234 -d '{
      "title" : "Test routing document"
    }'

    curl –XGET localhost:9200/sematext/test/_search/?q=*&routing=1234




                      Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr Index Structure
•   Field types defined in schema.xml file
•   Fields defined in schema.xml file
•   Allows automatic value copying
•   Allows dynamic fields
•   Allows custom similarity definition




                 Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Index Structure
•   Schema - less
•   Analyzers and filters defined with HTTP API
•   Fields defined with an HTTP request
•   Multi – field support
•   Allows nested documents
•   Allows parent – child relationship
•   Allows structured data

                 Copyright 2012 Sematext Int’l. All rights reserved
Index Structure Manipulation
• Possible to some extent in Solr as well as
  ElasticSearch
• ElasticSearch allows dynamic mappings
  update (not always)




               Copyright 2012 Sematext Int’l. All rights reserved
Aliasing
• Solr
  – Allows core aliasing
• ElasticSearch
  – Allows index aliasing
  – We can add filter to alias
  – We can add index routing
  – We can add search routing


                  Copyright 2012 Sematext Int’l. All rights reserved
Server Configuration
• Solr                                     • ElasticSearch
  – Static in solrconfig.xml                      – Static in elasticsearch.yml
  – Can be reloaded                               – Properties can be
    during runtime with                             changed during runtime
    collection/core reload                          (although not all) without
                                                    reloading




                   Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Gateway Module
• Your data time machine
• Stores indices and meta data
• Currently available:
  – Local
  – Shared FS
  – Hadoop
  – S3


                Copyright 2012 Sematext Int’l. All rights reserved
Discovery
• Apache Solr uses ZooKeeper
• ElasticSearch uses Zen Discovery




              Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Zen Discovery
• Allows automatic node discovery
• Provides multicast and unicast discovery
  methods
• Automatic master detection
• Two - way failure detection




               Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr & Apache ZooKeeper
• Requires additional software
• ZooKeeper ensemble with 1+ ZooKeeper
  instances
• Prevents split – brain situations
• Holds collections configurations
• Solr needs to know address of one of the
  ZooKeeper instances


              Copyright 2012 Sematext Int’l. All rights reserved
API
• HTTP REST API in ElasticSearch or Query String
  for simple queries
• HTTP with Query String in Apache Solr
• Both provide specialized Java API
  – SolrJ for Apache Solr and CloudSolrServer
  – ElasticSearch with TransportClient for remote
    connections



                Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr and Query String
• Queries are built of request parameters
• Some degree of structuring allowed (local
  params)
    curl 'http://localhost:8983/solr/select?q=text:weird&sort=date+desc'




                      Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch REST End-Points
• Simple queries built of request parameters
• Stuctured queries built as JSON objects
    curl –XGET
    'localhost:9200/sematext/test/_search/?q=_all:weird&sort=date:desc'

    curl -XGET 'localhost:9200/sematext/test_search' -d '{
    "query" : {
      "term" : {
        "_all" : "weird"
      },
      "sort" : {
        "date" : {
          "order" : "desc"
        }
      }
    }'

                         Copyright 2012 Sematext Int’l. All rights reserved
Data Handling
• Solr
  – Multiple formats allowed as input
  – Can return results in multiple formats
• ElasticSearch
  – JSON in / JSON out




                  Copyright 2012 Sematext Int’l. All rights reserved
Single or Batch
• Solr                                     • ElasticSearch
  – Single or multiple                            – Single document with a
    documents per                                   standard indexing call
    request                                       – _bulk end – point exposed
                                                    for batch indexing
                                                  – _bulk UDP end – point can
                                                    be exposed for low
                                                    latency batch indexing


                   Copyright 2012 Sematext Int’l. All rights reserved
Partial Document Updates
• Not based on LUCENE-3837 proposed by
  Andrzej Białecki
• Document reindexing on the side of search
  server
• Both servers use versioning to prevent
  changes being overwritten
• Can lead to decreased network traffic in some
  cases

              Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Partial Doc Update
• Special end – point exposed - _update
• Supports parameters like
  routing, parent, replication, percolate, etc
  (similar to Index API)
• Uses scripts to perform document updates
   curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{
     "script" : "ctx._source.enabled = enabled",
     "params" : {
       "enabled" : true
     }
   }'

                       Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr Partial Doc Update
• Sent to the standard update handler
• Requires _version_ field to be present

    curl 'localhost:8983/solr/update?commit=true' -H 'Content-
    type:application/json' -d '[
      {
        "id" : "12345",
        "enabled" : {
          "set" : true
        }
      }
    ]'


                      Copyright 2012 Sematext Int’l. All rights reserved
Solr Collections API
• Built on top of Core Admin
• Allows:
  – Collection creation
  – Collection reload
  – Collection deletion




                Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Indices REST API
• Allows:
  – Index creation
  – Index deletion
  – Index closing and opening
  – Index refreshing
  – Existence checking




               Copyright 2012 Sematext Int’l. All rights reserved
Analysis Chain Definition
• Solr                                    • ElasticSearch
  – Static in schema.xml                         – Static in elasticsearch.yml
  – Can be reloaded                              – Defined during index/type
    during runtime with                            creation with REST call
    collection/core reload                       – Possible to change with
                                                   update mapping call (not
                                                   all changes allowed)




                  Copyright 2012 Sematext Int’l. All rights reserved
Multilingual Data Handling
• Both ElasticSearch and Apache Solr built on
  top of Apache Lucene
• Solr – analyzers defined per field in schema.xml
  file
• ElasticSearch – analyzer defined in
  mappings, but can be set during query or
  specified on the basis of field values


               Copyright 2012 Sematext Int’l. All rights reserved
Results Grouping
• Available in Apache Solr only
• Allows for results grouping based on:
  – Field value
  – Query
  – Function query (not available during distributed
    searching)




                Copyright 2012 Sematext Int’l. All rights reserved
Prospective Search
• Allows for checking if a document matches a
  stored query
• Not available in Apache Solr
• Available in ElasticSearch under the name of
  Percolator




               Copyright 2012 Sematext Int’l. All rights reserved
Spellchecker
• Allows to check and correct spelling mistakes
• Not available in ElasticSearch currently
• Multiple implementations available in Apache
  Solr
  – IndexBasedSpellChecker
  – WordBreakSolrSpellChecker
  – DirectSolrSpellChecker


              Copyright 2012 Sematext Int’l. All rights reserved
Full Text Search Capabilities
•   Variety of queries
•   Ability to control score calculation
•   Different query parsers available
•   Advanced Lucene queries (like SpanQueries)
    exposed




                Copyright 2012 Sematext Int’l. All rights reserved
Score Calculation
•   Leverage Lucene scoring capabilities
•   Control over document importance
•   Control over query importance
•   Control over term and phrase importance




                Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr and Score Influence
• Index time
  – Document boosts
  – Field boosts
• Query time
  – Term boosts
  – Field boosts
  – Phrases boost
  – Function queries

               Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch and Score Influence
• Index time
  – Document and field boosts
• Query time
  – Different queries provide different boost controls
  – Can calculate distributed term frequencies
  – Negative and Positive boosting queries
  – Custom score filters
• Scripts
  – Control scoring with scripts

                Copyright 2012 Sematext Int’l. All rights reserved
Nested Objects
• Possible only in ElasticSearch
• Indexed as separate documents
• Stored in the same part of the index as the
  root document
• Hidden from standard queries and filters
• Need appropriate queries and filters (nested)



               Copyright 2012 Sematext Int’l. All rights reserved
More Like This
• Lets us find similar documents
• Solr
  – More Like This Component
• ElasticSearch
  – More Like This Query
  – More Like This Field Query
  – _mlt REST end – point



                  Copyright 2012 Sematext Int’l. All rights reserved
Solr Parent – Child Relationship
• Used at query time
• Multi core joins possible
      http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow




                         Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Parent – Child Handling
• Proper indexing required
• Indexed as separate documents
• Standard queries don’t return child
  documents
• In order to retrieve parent docs one should
  use appropriate queries and filters
  (has_child, has_parent, top_children)


               Copyright 2012 Sematext Int’l. All rights reserved
Filters
•   Used to narrown down query results
•   Good candidates for caching and reuse
•   Supported by ElasticSearch and Apache Solr
•   Should be used for repeatable query elements




                Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr Filter Queries
•   Multiple filters per query
•   Filters are addictive
•   Different query parsers can be used
•   Local params can be used
•   Narrow down faceting results




                Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Filtered Queries
• Can be defined using queries exposed by the
  Query DSL
• Can be used for custom score calculation
  (i.e., custom filters score query)
• Doesn’t narrow down faceting results by
  default (facets have their own filters)



              Copyright 2012 Sematext Int’l. All rights reserved
Filter Cache Control
• Both Solr and ElasticSearch let us control
  cache for filters
• Solr
  – Using local params and cache property
• ElasticSearch
  – _cache   property
  –   _cache_key property




                  Copyright 2012 Sematext Int’l. All rights reserved
Faceting
• Both provide common facets
  – Terms
  – Range & query
  – Terms statistics
  – Spatial distance
• Solr
  – Pivot faceting
• ElasticSearch
  – Histograms

                  Copyright 2012 Sematext Int’l. All rights reserved
Real Time Or Not ?
• Allow getting document not yet indexed
• Don’t need searcher reopening
• ElasticSearch
  – Separate Get and Multi Get API’s
• Apache Solr
  – Separate Realtime Get Handler
  – Can be used as a search component


                Copyright 2012 Sematext Int’l. All rights reserved
Caches and Warming
• ElasticSearch and Solr allow caching
• Both allow running warming queries
• ElasticSearch by default doesn’t limit cache
  sizes




               Copyright 2012 Sematext Int’l. All rights reserved
Solr Caches
• Types
   – Filter Cache
   – Query Result Cache
   – Document Cache
• Implementation choices
   – LRUCache
   – FastLRUCache
   – LFUCache
• Other configuration options:
   – Size
   – Maximum size
   – Autowarming count


                    Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Caches
• Types
  – Filter Cache
  – Field Data Cache
• Implementation choices
  – Resident
  – Soft
  – Weak
• Other configuration options:
  – Max size (entries per segment)
  – Expiration time

                Copyright 2012 Sematext Int’l. All rights reserved
Cluster State Monitoring
• Apache Solr – multiple mbeans exposed by
  JMX
• ElasticSearch – multiple REST end – points
  exposed to get different statistics




              Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Statistics API
•   Health and State Check
•   Nodes Information and Statistics
•   Cache Statistics
•   Index Segments Information
•   Index Information and Statistics
•   Mappings Information



                Copyright 2012 Sematext Int’l. All rights reserved
Cluster Monitoring




  Copyright 2012 Sematext Int’l. All rights reserved
Cluster Monitoring with SPM




       Copyright 2012 Sematext Int’l. All rights reserved
Cluster Settings Update
• ElasticSearch lets us:
  – Control rebalancing
  – Control recovery
  – Control allocation
  – Change the above on the live cluster




                Copyright 2012 Sematext Int’l. All rights reserved
Custom Shard Allocation
• Possible in ElasticSearch
• Cluster level:
  curl -XPUT localhost:9200/_cluster/settings -d '{
     "persistent" : {
       "cluster.routing.allocation.exclude._ip" : "192.168.2.1"
     }
  }'

• Index level:
  curl -XPUT localhost:9200/sematext/ -d '{
     "index.routing.allocation.include.tag" : "nodeOne,nodeTwo"
  }'




                               Copyright 2012 Sematext Int’l. All rights reserved
Moving Shards and Replicas
• Possible in ElasticSearch, not available in Solr
• Allows to move shards and replicas to any
  node in the cluster on demand
• Available in ElasticSearch:
  curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
   "commands" : [
    {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}},
    {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}
   ]
  }'




                              Copyright 2012 Sematext Int’l. All rights reserved
And The Winner Is ?




   Copyright 2012 Sematext Int’l. All rights reserved
How to Reach Us
• Rafał Kuć
  – Twitter: @kucrafal
  – E-mail: rafal.kuc@sematext.com
• Sematext
  – Twitter: @sematext
  – Website: http://sematext.com
• Solr vs ElasticSearch series:
  • http://blog.sematext.com/2012/08/23/solr-vs-
    elasticsearch-part-1-overview/

                 Copyright 2012 Sematext Int’l. All rights reserved
We Are Hiring !
•   Dig Search ?
•   Dig Analytics ?
•   Dig Big Data ?
•   Dig Performance ?
•   Dig working with and in open – source ?
•   We’re hiring world – wide !
       http://sematext.com/about/jobs.html

                Copyright 2012 Sematext Int’l. All rights reserved

More Related Content

What's hot

From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solrguest432cd6
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 

What's hot (20)

From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 

Viewers also liked

Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETTomas Jansson
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Solr Anti - patterns
Solr Anti - patternsSolr Anti - patterns
Solr Anti - patternsRafał Kuć
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrAnshum Gupta
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersRafał Kuć
 
Catalogue de stage
Catalogue de stageCatalogue de stage
Catalogue de stagemzoughi Anis
 
ElasticSearch Introduction
ElasticSearch IntroductionElasticSearch Introduction
ElasticSearch IntroductionTsungWei Hu
 

Viewers also liked (20)

Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NET
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Solr Anti - patterns
Solr Anti - patternsSolr Anti - patterns
Solr Anti - patterns
 
Working with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache SolrWorking with deeply nested documents in Apache Solr
Working with deeply nested documents in Apache Solr
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Elastic search introduction
Elastic search introductionElastic search introduction
Elastic search introduction
 
Catalogue de stage
Catalogue de stageCatalogue de stage
Catalogue de stage
 
ElasticSearch Introduction
ElasticSearch IntroductionElasticSearch Introduction
ElasticSearch Introduction
 

Similar to Battle of the giants: Apache Solr vs ElasticSearch

Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersSematext Group, Inc.
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataOpenSource Connections
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakTommaso Teofili
 
hibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfhibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfPatiento Del Mar
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 

Similar to Battle of the giants: Apache Solr vs ElasticSearch (20)

Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
Solr
SolrSolr
Solr
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Apache solr
Apache solrApache solr
Apache solr
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
hibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfhibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdf
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 

Recently uploaded

Project & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEProject & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEDeShawn Ellis
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfArturo Pacheco Alvarez
 
Introduction to Basketball-PowerPoint Presentation
Introduction to Basketball-PowerPoint PresentationIntroduction to Basketball-PowerPoint Presentation
Introduction to Basketball-PowerPoint PresentationJuliusMacaballug
 
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 GACOR
 
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxItaly Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxWorld Wide Tickets And Hospitality
 
PPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports RivalryPPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports Rivalryanirbannath184
 
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxJORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxArturo Pacheco Alvarez
 
Benifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxBenifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxsherrymieg19
 
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...World Wide Tickets And Hospitality
 
Clash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfClash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfMuhammad Hashim
 

Recently uploaded (11)

Project & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWEProject & Portfolio, Market Analysis: WWE
Project & Portfolio, Market Analysis: WWE
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
 
Introduction to Basketball-PowerPoint Presentation
Introduction to Basketball-PowerPoint PresentationIntroduction to Basketball-PowerPoint Presentation
Introduction to Basketball-PowerPoint Presentation
 
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
DONAL88 >LINK SLOT PG SOFT TERGACOR 2024
 
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxItaly Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
 
NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9
NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9
NATIONAL SPORTS DAY WRITTEN QUIZ by QUI9
 
PPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports RivalryPPT on INDIA VS PAKISTAN - A Sports Rivalry
PPT on INDIA VS PAKISTAN - A Sports Rivalry
 
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docxJORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
JORNADA 2 LIGA MUROBASQUETBOL1 2024.docx
 
Benifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptxBenifits of Individual And Team Sports-Group 7.pptx
Benifits of Individual And Team Sports-Group 7.pptx
 
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
Spain Vs Italy Showdown Between Italy and Spain Could Determine UEFA Euro 202...
 
Clash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdfClash of Titans_ PSG vs Barcelona (1).pdf
Clash of Titans_ PSG vs Barcelona (1).pdf
 

Battle of the giants: Apache Solr vs ElasticSearch

  • 1. Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć – Sematext International @kucrafal @sematext sematext.com
  • 2. Who Am I • „Solr 3.1 Cookbook” author (4.0 inc) • Sematext consultant & engineer • Solr.pl co-founder • Father and husband  Copyright 2012 Sematext Int’l. All rights reserved
  • 3. What Will I Talk About ? Copyright 2012 Sematext Int’l. All rights reserved
  • 4. Under the Hood • ElasticSearch 0.20 – Apache Lucene 3.6.1 • Apache Solr 4.0 – Apache Lucene 4.0 Copyright 2012 Sematext Int’l. All rights reserved
  • 5. Architecture • What we expect – Scalability – Fault toleranance – High availablity – Features • What we are also looking for – Manageability – Installation ease – Tools Copyright 2012 Sematext Int’l. All rights reserved
  • 6. ElasticSearch Cluster Architecture • Distributed • Fault tolerant • Only ElasticSearch nodes • Single leader • Automatic leader election Copyright 2012 Sematext Int’l. All rights reserved
  • 7. SolrCloud Cluster Architecture • Distributed • Fault tolerant • Apache Solr + ZooKeeper ensemble • Leader per shard • Automatic leader election Copyright 2012 Sematext Int’l. All rights reserved
  • 8. Collection vs Index • Collection – Solr main logical index • Index – ElasticSearch main logic structure • Collections and Indices can be spread among different nodes in the cluster Copyright 2012 Sematext Int’l. All rights reserved
  • 9. Multiple Document Types in Index • ElasticSearch - multiple document types in a single index • Apache Solr - multiple document types in a single collection – shared schema.xml Copyright 2012 Sematext Int’l. All rights reserved
  • 10. Shards and Replicas • Index / Collection can have many shards • Each shard can have 0 or more replicas • Replicas are automatically updated • Replicas can be promoted to leaders when a leader shard goes off-line Copyright 2012 Sematext Int’l. All rights reserved
  • 11. Index and Query Routing • Control where documents are going • Control where queries are going • Manual data distribution Copyright 2012 Sematext Int’l. All rights reserved
  • 12. Querying Without Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Collection / Index Application Copyright 2012 Sematext Int’l. All rights reserved
  • 13. Query With Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Collection / Index Application Copyright 2012 Sematext Int’l. All rights reserved
  • 14. Routing Docs and Queries in Solr • Requires some effort • Defaults to hash based on document identifiers • Can be turned off using solr.NoOpDistributingUpdateProcessorFactory <updateRequestProcessorChain> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> <processor class="solr.NoOpDistributingUpdateProcessorFactory" /> </updateRequestProcessorChain> Copyright 2012 Sematext Int’l. All rights reserved
  • 15. Routing Docs and Queries - ElasticSearch • routing parameter controls target shard which document/query will be forwarded to • defaults to document identifiers • can be changed to any value curl -XPUT localhost:9200/sematext/test/1?routing=1234 -d '{ "title" : "Test routing document" }' curl –XGET localhost:9200/sematext/test/_search/?q=*&routing=1234 Copyright 2012 Sematext Int’l. All rights reserved
  • 16. Apache Solr Index Structure • Field types defined in schema.xml file • Fields defined in schema.xml file • Allows automatic value copying • Allows dynamic fields • Allows custom similarity definition Copyright 2012 Sematext Int’l. All rights reserved
  • 17. ElasticSearch Index Structure • Schema - less • Analyzers and filters defined with HTTP API • Fields defined with an HTTP request • Multi – field support • Allows nested documents • Allows parent – child relationship • Allows structured data Copyright 2012 Sematext Int’l. All rights reserved
  • 18. Index Structure Manipulation • Possible to some extent in Solr as well as ElasticSearch • ElasticSearch allows dynamic mappings update (not always) Copyright 2012 Sematext Int’l. All rights reserved
  • 19. Aliasing • Solr – Allows core aliasing • ElasticSearch – Allows index aliasing – We can add filter to alias – We can add index routing – We can add search routing Copyright 2012 Sematext Int’l. All rights reserved
  • 20. Server Configuration • Solr • ElasticSearch – Static in solrconfig.xml – Static in elasticsearch.yml – Can be reloaded – Properties can be during runtime with changed during runtime collection/core reload (although not all) without reloading Copyright 2012 Sematext Int’l. All rights reserved
  • 21. ElasticSearch Gateway Module • Your data time machine • Stores indices and meta data • Currently available: – Local – Shared FS – Hadoop – S3 Copyright 2012 Sematext Int’l. All rights reserved
  • 22. Discovery • Apache Solr uses ZooKeeper • ElasticSearch uses Zen Discovery Copyright 2012 Sematext Int’l. All rights reserved
  • 23. ElasticSearch Zen Discovery • Allows automatic node discovery • Provides multicast and unicast discovery methods • Automatic master detection • Two - way failure detection Copyright 2012 Sematext Int’l. All rights reserved
  • 24. Apache Solr & Apache ZooKeeper • Requires additional software • ZooKeeper ensemble with 1+ ZooKeeper instances • Prevents split – brain situations • Holds collections configurations • Solr needs to know address of one of the ZooKeeper instances Copyright 2012 Sematext Int’l. All rights reserved
  • 25. API • HTTP REST API in ElasticSearch or Query String for simple queries • HTTP with Query String in Apache Solr • Both provide specialized Java API – SolrJ for Apache Solr and CloudSolrServer – ElasticSearch with TransportClient for remote connections Copyright 2012 Sematext Int’l. All rights reserved
  • 26. Apache Solr and Query String • Queries are built of request parameters • Some degree of structuring allowed (local params) curl 'http://localhost:8983/solr/select?q=text:weird&sort=date+desc' Copyright 2012 Sematext Int’l. All rights reserved
  • 27. ElasticSearch REST End-Points • Simple queries built of request parameters • Stuctured queries built as JSON objects curl –XGET 'localhost:9200/sematext/test/_search/?q=_all:weird&sort=date:desc' curl -XGET 'localhost:9200/sematext/test_search' -d '{ "query" : { "term" : { "_all" : "weird" }, "sort" : { "date" : { "order" : "desc" } } }' Copyright 2012 Sematext Int’l. All rights reserved
  • 28. Data Handling • Solr – Multiple formats allowed as input – Can return results in multiple formats • ElasticSearch – JSON in / JSON out Copyright 2012 Sematext Int’l. All rights reserved
  • 29. Single or Batch • Solr • ElasticSearch – Single or multiple – Single document with a documents per standard indexing call request – _bulk end – point exposed for batch indexing – _bulk UDP end – point can be exposed for low latency batch indexing Copyright 2012 Sematext Int’l. All rights reserved
  • 30. Partial Document Updates • Not based on LUCENE-3837 proposed by Andrzej Białecki • Document reindexing on the side of search server • Both servers use versioning to prevent changes being overwritten • Can lead to decreased network traffic in some cases Copyright 2012 Sematext Int’l. All rights reserved
  • 31. ElasticSearch Partial Doc Update • Special end – point exposed - _update • Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) • Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2012 Sematext Int’l. All rights reserved
  • 32. Apache Solr Partial Doc Update • Sent to the standard update handler • Requires _version_ field to be present curl 'localhost:8983/solr/update?commit=true' -H 'Content- type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2012 Sematext Int’l. All rights reserved
  • 33. Solr Collections API • Built on top of Core Admin • Allows: – Collection creation – Collection reload – Collection deletion Copyright 2012 Sematext Int’l. All rights reserved
  • 34. ElasticSearch Indices REST API • Allows: – Index creation – Index deletion – Index closing and opening – Index refreshing – Existence checking Copyright 2012 Sematext Int’l. All rights reserved
  • 35. Analysis Chain Definition • Solr • ElasticSearch – Static in schema.xml – Static in elasticsearch.yml – Can be reloaded – Defined during index/type during runtime with creation with REST call collection/core reload – Possible to change with update mapping call (not all changes allowed) Copyright 2012 Sematext Int’l. All rights reserved
  • 36. Multilingual Data Handling • Both ElasticSearch and Apache Solr built on top of Apache Lucene • Solr – analyzers defined per field in schema.xml file • ElasticSearch – analyzer defined in mappings, but can be set during query or specified on the basis of field values Copyright 2012 Sematext Int’l. All rights reserved
  • 37. Results Grouping • Available in Apache Solr only • Allows for results grouping based on: – Field value – Query – Function query (not available during distributed searching) Copyright 2012 Sematext Int’l. All rights reserved
  • 38. Prospective Search • Allows for checking if a document matches a stored query • Not available in Apache Solr • Available in ElasticSearch under the name of Percolator Copyright 2012 Sematext Int’l. All rights reserved
  • 39. Spellchecker • Allows to check and correct spelling mistakes • Not available in ElasticSearch currently • Multiple implementations available in Apache Solr – IndexBasedSpellChecker – WordBreakSolrSpellChecker – DirectSolrSpellChecker Copyright 2012 Sematext Int’l. All rights reserved
  • 40. Full Text Search Capabilities • Variety of queries • Ability to control score calculation • Different query parsers available • Advanced Lucene queries (like SpanQueries) exposed Copyright 2012 Sematext Int’l. All rights reserved
  • 41. Score Calculation • Leverage Lucene scoring capabilities • Control over document importance • Control over query importance • Control over term and phrase importance Copyright 2012 Sematext Int’l. All rights reserved
  • 42. Apache Solr and Score Influence • Index time – Document boosts – Field boosts • Query time – Term boosts – Field boosts – Phrases boost – Function queries Copyright 2012 Sematext Int’l. All rights reserved
  • 43. ElasticSearch and Score Influence • Index time – Document and field boosts • Query time – Different queries provide different boost controls – Can calculate distributed term frequencies – Negative and Positive boosting queries – Custom score filters • Scripts – Control scoring with scripts Copyright 2012 Sematext Int’l. All rights reserved
  • 44. Nested Objects • Possible only in ElasticSearch • Indexed as separate documents • Stored in the same part of the index as the root document • Hidden from standard queries and filters • Need appropriate queries and filters (nested) Copyright 2012 Sematext Int’l. All rights reserved
  • 45. More Like This • Lets us find similar documents • Solr – More Like This Component • ElasticSearch – More Like This Query – More Like This Field Query – _mlt REST end – point Copyright 2012 Sematext Int’l. All rights reserved
  • 46. Solr Parent – Child Relationship • Used at query time • Multi core joins possible http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow Copyright 2012 Sematext Int’l. All rights reserved
  • 47. ElasticSearch Parent – Child Handling • Proper indexing required • Indexed as separate documents • Standard queries don’t return child documents • In order to retrieve parent docs one should use appropriate queries and filters (has_child, has_parent, top_children) Copyright 2012 Sematext Int’l. All rights reserved
  • 48. Filters • Used to narrown down query results • Good candidates for caching and reuse • Supported by ElasticSearch and Apache Solr • Should be used for repeatable query elements Copyright 2012 Sematext Int’l. All rights reserved
  • 49. Apache Solr Filter Queries • Multiple filters per query • Filters are addictive • Different query parsers can be used • Local params can be used • Narrow down faceting results Copyright 2012 Sematext Int’l. All rights reserved
  • 50. ElasticSearch Filtered Queries • Can be defined using queries exposed by the Query DSL • Can be used for custom score calculation (i.e., custom filters score query) • Doesn’t narrow down faceting results by default (facets have their own filters) Copyright 2012 Sematext Int’l. All rights reserved
  • 51. Filter Cache Control • Both Solr and ElasticSearch let us control cache for filters • Solr – Using local params and cache property • ElasticSearch – _cache property – _cache_key property Copyright 2012 Sematext Int’l. All rights reserved
  • 52. Faceting • Both provide common facets – Terms – Range & query – Terms statistics – Spatial distance • Solr – Pivot faceting • ElasticSearch – Histograms Copyright 2012 Sematext Int’l. All rights reserved
  • 53. Real Time Or Not ? • Allow getting document not yet indexed • Don’t need searcher reopening • ElasticSearch – Separate Get and Multi Get API’s • Apache Solr – Separate Realtime Get Handler – Can be used as a search component Copyright 2012 Sematext Int’l. All rights reserved
  • 54. Caches and Warming • ElasticSearch and Solr allow caching • Both allow running warming queries • ElasticSearch by default doesn’t limit cache sizes Copyright 2012 Sematext Int’l. All rights reserved
  • 55. Solr Caches • Types – Filter Cache – Query Result Cache – Document Cache • Implementation choices – LRUCache – FastLRUCache – LFUCache • Other configuration options: – Size – Maximum size – Autowarming count Copyright 2012 Sematext Int’l. All rights reserved
  • 56. ElasticSearch Caches • Types – Filter Cache – Field Data Cache • Implementation choices – Resident – Soft – Weak • Other configuration options: – Max size (entries per segment) – Expiration time Copyright 2012 Sematext Int’l. All rights reserved
  • 57. Cluster State Monitoring • Apache Solr – multiple mbeans exposed by JMX • ElasticSearch – multiple REST end – points exposed to get different statistics Copyright 2012 Sematext Int’l. All rights reserved
  • 58. ElasticSearch Statistics API • Health and State Check • Nodes Information and Statistics • Cache Statistics • Index Segments Information • Index Information and Statistics • Mappings Information Copyright 2012 Sematext Int’l. All rights reserved
  • 59. Cluster Monitoring Copyright 2012 Sematext Int’l. All rights reserved
  • 60. Cluster Monitoring with SPM Copyright 2012 Sematext Int’l. All rights reserved
  • 61. Cluster Settings Update • ElasticSearch lets us: – Control rebalancing – Control recovery – Control allocation – Change the above on the live cluster Copyright 2012 Sematext Int’l. All rights reserved
  • 62. Custom Shard Allocation • Possible in ElasticSearch • Cluster level: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" } }' • Index level: curl -XPUT localhost:9200/sematext/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2012 Sematext Int’l. All rights reserved
  • 63. Moving Shards and Replicas • Possible in ElasticSearch, not available in Solr • Allows to move shards and replicas to any node in the cluster on demand • Available in ElasticSearch: curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2012 Sematext Int’l. All rights reserved
  • 64. And The Winner Is ? Copyright 2012 Sematext Int’l. All rights reserved
  • 65. How to Reach Us • Rafał Kuć – Twitter: @kucrafal – E-mail: rafal.kuc@sematext.com • Sematext – Twitter: @sematext – Website: http://sematext.com • Solr vs ElasticSearch series: • http://blog.sematext.com/2012/08/23/solr-vs- elasticsearch-part-1-overview/ Copyright 2012 Sematext Int’l. All rights reserved
  • 66. We Are Hiring ! • Dig Search ? • Dig Analytics ? • Dig Big Data ? • Dig Performance ? • Dig working with and in open – source ? • We’re hiring world – wide ! http://sematext.com/about/jobs.html Copyright 2012 Sematext Int’l. All rights reserved