Your SlideShare is downloading. ×
0
Battle of the GiantsRafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com
Ich bin ein…Sematext consultant & engineerSolr Cookbook series author„ElasticSearch Server” author„Mastering ElasticSearch...
Copyright 2013 Sematext Group. Inc. All rights reserved
Under the HoodCopyright 2013 Sematext Group. Inc. All rights reservedLucene 4.3Lucene 4.3
ExpectationsScalabilityFault tolerananceHigh availablityFeaturesManageabilityEase of installationToolsSupportCopyright 201...
Expectations vs RealityOnly ElasticSearch nodesSingle leaderCopyright 2013 Sematext Group. Inc. All rights reservedSolr + ...
All Time Top CommittersCopyright 2013 Sematext Group. Inc. All rights reserved
Active ContributorsCopyright 2013 Sematext Group. Inc. All rights reserved
The CodeCopyright 2013 Sematext Group. Inc. All rights reserved
The Mailing ListsCopyright 2013 Sematext Group. Inc. All rights reserved
TrendsCopyright 2013 Sematext Group. Inc. All rights reserved
Collection vs IndexCollections and Indices can be spread amongdifferent nodes in the clusterCopyright 2013 Sematext Group....
Apache Solr Index StructureField and types defined in schemaAutomatic value copyingDynamic fieldsCustom similarityCustom p...
ElasticSearch Index StructureSchema - lessFields and types defined with HTTP APIMulti – field supportNested and parent – c...
Shards and ReplicasMany shards0 or more replicasReplica can become leaderReplicas can be created onlive clusterCopyright 2...
ConfigurationStatic in solrconfig.xmlCan be reloaded withcore reloadStatic in elasticsearch.ymlChangable at runtimeCopyrig...
DiscoveryCopyright 2013 Sematext Group. Inc. All rights reservedZen DiscoveryApache Zookeeper
Solr & ZooKeeperRequires additional softwarePrevents split – brain situationsHolds collections configurationsZooKeeper ens...
ElasticSearch Zen DiscoveryAutomatic node discoveryMulticast and unicast discovery methodsAutomatic master detectionTwo - ...
HTTP FTWHTTP REST API in ElasticSearch or Query Stringfor simple queriesHTTP with Query String in Apache SolrBoth provide ...
Results GroupingGroup on:field valuequery resultfunction queryCopyright 2013 Sematext Group. Inc. All rights reserved
Prospective SearchCalled PercolatorMatches documents to stored queriesCopyright 2013 Sematext Group. Inc. All rights reser...
Full Text Search CapabilitiesVariety of queriesControl score calculationDifferent query parsersAdvanced Lucene queriesCopy...
Score CalculationLeverage Lucene scoringControl importance of:documentsqueriestermsphrasesSimiliarity configurationCopyrig...
Apache Solr and Score InfluenceIndex - time boostingQuery - timeTerm boostsField boostsPhrases boostFunction queriesSub-qu...
ElasticSearch and Score InfluenceIndex - timeQuery - timeDifferent queries provide different boost controlsCan calculate d...
ElasticSearch Query RescoreReorders top N hits by using other queryExecuted on shards before results are returnedto the no...
ElasticSearch Nested ObjectsIndexed as separate documentsStored in the same part of index as root docHidden from standard ...
Solr Parent – Child RelationshipUsed at query timeMulti core joins possibleselect?q={!join from=parent to=id}color:YellowC...
ElasticSearch Parent – ChildProper indexing requiredIndexed as separate documentsStandard queries don’t return child docum...
FiltersUsed to narrown down query resultsGood candidates for caching and reuseCopyright 2013 Sematext Group. Inc. All righ...
FacetingCopyright 2013 Sematext Group. Inc. All rights reservedTermsRange & queryTerms statisticsSpatial distancePivot His...
Real Time Or Not ?Get not yet indexed docs from transaction logDon’t need searcher reopeningCopyright 2013 Sematext Group....
Data HandlingSingle and batch indexing supportedCopyright 2013 Sematext Group. Inc. All rights reservedJSON in / JSON out(...
Partial Document UpdatesNot based on LUCENE-3837Server-side doc reindexingBoth servers use versioningDecreases network tra...
Apache Solr Partial Doc UpdateSent to the standard update handlerRequires _version_ fieldcurl localhost:8983/solr/update?c...
ElasticSearch Partial Doc UpdateSpecial end – point exposed - _updateSupports parameters like routing, parent,replication,...
Solr Collections APICollectioncreationreloaddeletionshards splittingCopyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Indices REST APIIndexcreationdeletionclosing and openingrefreshingexistence checkingCopyright 2013 Sematext ...
Apache Solr Shard SplittingCopyright 2013 Sematext Group. Inc. All rights reservedadmin/collections?action=SPLITSHARD&coll...
Cluster State MonitoringCopyright 2013 Sematext Group. Inc. All rights reservedMultiple MBeans exposed byJMXMultiple REST ...
ElasticSearch Statistics APIHealth and state checkNodes informationCache statisticsSegments informationIndex informationMa...
ElasticSearch Cluster Settings UpdateControlrebalancingrecoveryallocationChange cluster configuration propertiesCopyright ...
ElasticSearch Custom Shard AllocationCluster level:Index level:curl -XPUT localhost:9200/_cluster/settings -d {"persistent...
Moving Shards and ReplicasMove shards between nodes on demandcurl -XPOST localhost:9200/_cluster/reroute -d {"commands" : ...
Copyright 2013 Sematext Group. Inc. All rights reservedThe Verdict
And The Winner Is ?Copyright 2013 Sematext Group. Inc. All rights reserved
We Are Hiring !Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiri...
Copyright 2013 Sematext Group. Inc. All rights reservedRafał Kuć@kucrafalrafal.kuc@sematext.comSematext@sematexthttp://sem...
Upcoming SlideShare
Loading in...5
×

Battle of the Giants round 2

2,331

Published on

Second round of the "Battle of the Giants" talk that was given at Berlin Buzzwords 2013.

Published in: Technology

Transcript of "Battle of the Giants round 2"

  1. 1. Battle of the GiantsRafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com
  2. 2. Ich bin ein…Sematext consultant & engineerSolr Cookbook series author„ElasticSearch Server” author„Mastering ElasticSearch” authorSolr.pl co-founderFather and husband Copyright 2013 Sematext Group. Inc. All rights reserved
  3. 3. Copyright 2013 Sematext Group. Inc. All rights reserved
  4. 4. Under the HoodCopyright 2013 Sematext Group. Inc. All rights reservedLucene 4.3Lucene 4.3
  5. 5. ExpectationsScalabilityFault tolerananceHigh availablityFeaturesManageabilityEase of installationToolsSupportCopyright 2013 Sematext Group. Inc. All rights reserved
  6. 6. Expectations vs RealityOnly ElasticSearch nodesSingle leaderCopyright 2013 Sematext Group. Inc. All rights reservedSolr + ZooKeeperLeader per shardDistributedFault tolerantAutomatic leader election
  7. 7. All Time Top CommittersCopyright 2013 Sematext Group. Inc. All rights reserved
  8. 8. Active ContributorsCopyright 2013 Sematext Group. Inc. All rights reserved
  9. 9. The CodeCopyright 2013 Sematext Group. Inc. All rights reserved
  10. 10. The Mailing ListsCopyright 2013 Sematext Group. Inc. All rights reserved
  11. 11. TrendsCopyright 2013 Sematext Group. Inc. All rights reserved
  12. 12. Collection vs IndexCollections and Indices can be spread amongdifferent nodes in the clusterCopyright 2013 Sematext Group. Inc. All rights reservedCollection – mainlogical indexIndex – mainlogical structure
  13. 13. Apache Solr Index StructureField and types defined in schemaAutomatic value copyingDynamic fieldsCustom similarityCustom postings formatMultiple document types require shared schemaCan be read using APICopyright 2013 Sematext Group. Inc. All rights reserved
  14. 14. ElasticSearch Index StructureSchema - lessFields and types defined with HTTP APIMulti – field supportNested and parent – child documentsCustom similarityCustom postings formatMultiple document with different structureCan be read and written using APICopyright 2013 Sematext Group. Inc. All rights reserved
  15. 15. Shards and ReplicasMany shards0 or more replicasReplica can become leaderReplicas can be created onlive clusterCopyright 2013 Sematext Group. Inc. All rights reserved
  16. 16. ConfigurationStatic in solrconfig.xmlCan be reloaded withcore reloadStatic in elasticsearch.ymlChangable at runtimeCopyright 2013 Sematext Group. Inc. All rights reserved
  17. 17. DiscoveryCopyright 2013 Sematext Group. Inc. All rights reservedZen DiscoveryApache Zookeeper
  18. 18. Solr & ZooKeeperRequires additional softwarePrevents split – brain situationsHolds collections configurationsZooKeeper ensemble neededCopyright 2013 Sematext Group. Inc. All rights reserved
  19. 19. ElasticSearch Zen DiscoveryAutomatic node discoveryMulticast and unicast discovery methodsAutomatic master detectionTwo - way failure detectionCopyright 2013 Sematext Group. Inc. All rights reserved
  20. 20. HTTP FTWHTTP REST API in ElasticSearch or Query Stringfor simple queriesHTTP with Query String in Apache SolrBoth provide specialized Java APICopyright 2013 Sematext Group. Inc. All rights reserved
  21. 21. Results GroupingGroup on:field valuequery resultfunction queryCopyright 2013 Sematext Group. Inc. All rights reserved
  22. 22. Prospective SearchCalled PercolatorMatches documents to stored queriesCopyright 2013 Sematext Group. Inc. All rights reserved
  23. 23. Full Text Search CapabilitiesVariety of queriesControl score calculationDifferent query parsersAdvanced Lucene queriesCopyright 2013 Sematext Group. Inc. All rights reserved
  24. 24. Score CalculationLeverage Lucene scoringControl importance of:documentsqueriestermsphrasesSimiliarity configurationCopyright 2013 Sematext Group. Inc. All rights reserved
  25. 25. Apache Solr and Score InfluenceIndex - time boostingQuery - timeTerm boostsField boostsPhrases boostFunction queriesSub-queries used for boostingCopyright 2013 Sematext Group. Inc. All rights reserved
  26. 26. ElasticSearch and Score InfluenceIndex - timeQuery - timeDifferent queries provide different boost controlsCan calculate distributed term frequenciesNegative and Positive boosting queriesCustom score filtersScriptsCopyright 2013 Sematext Group. Inc. All rights reserved
  27. 27. ElasticSearch Query RescoreReorders top N hits by using other queryExecuted on shards before results are returnedto the node handling itNot executed with scan and countCopyright 2013 Sematext Group. Inc. All rights reserved
  28. 28. ElasticSearch Nested ObjectsIndexed as separate documentsStored in the same part of index as root docHidden from standard queries and filtersNeed appropriate queries and filters (nested)Top level documents can be sorted on the basisof nested onesCopyright 2013 Sematext Group. Inc. All rights reserved
  29. 29. Solr Parent – Child RelationshipUsed at query timeMulti core joins possibleselect?q={!join from=parent to=id}color:YellowCopyright 2013 Sematext Group. Inc. All rights reserved
  30. 30. ElasticSearch Parent – ChildProper indexing requiredIndexed as separate documentsStandard queries don’t return child documentsRetrieve parent docs using queries and filters(has_child, has_parent, top_children)Copyright 2013 Sematext Group. Inc. All rights reserved
  31. 31. FiltersUsed to narrown down query resultsGood candidates for caching and reuseCopyright 2013 Sematext Group. Inc. All rights reservedAddictiveCan use different query parsersCan use local paramsNarrows down faceting resultsDefined using Query DSLCan be used for score calculationDoesn’t narrow down facetingresults
  32. 32. FacetingCopyright 2013 Sematext Group. Inc. All rights reservedTermsRange & queryTerms statisticsSpatial distancePivot Histograms
  33. 33. Real Time Or Not ?Get not yet indexed docs from transaction logDon’t need searcher reopeningCopyright 2013 Sematext Group. Inc. All rights reservedSeparate Get andMulti Get APISeparate Realtime GetHandler
  34. 34. Data HandlingSingle and batch indexing supportedCopyright 2013 Sematext Group. Inc. All rights reservedJSON in / JSON out(and YAML)Different formats allowed(XML, JSON, CSV, binary)
  35. 35. Partial Document UpdatesNot based on LUCENE-3837Server-side doc reindexingBoth servers use versioningDecreases network trafficCopyright 2013 Sematext Group. Inc. All rights reserved
  36. 36. Apache Solr Partial Doc UpdateSent to the standard update handlerRequires _version_ fieldcurl localhost:8983/solr/update?commit=true -HContent-type:application/json -d [ {"id" : "12345","enabled" : {"set" : true}} ]Copyright 2013 Sematext Group. Inc. All rights reserved
  37. 37. ElasticSearch Partial Doc UpdateSpecial end – point exposed - _updateSupports parameters like routing, parent,replication, percolate, etc (similar to Index API)Uses scripts to perform document updatescurl -XPOST localhost:9200/sematext/test/12345/_update -d {"script" : "ctx._source.enabled = enabled","params" : {"enabled" : true}}Copyright 2013 Sematext Group. Inc. All rights reserved
  38. 38. Solr Collections APICollectioncreationreloaddeletionshards splittingCopyright 2013 Sematext Group. Inc. All rights reserved
  39. 39. ElasticSearch Indices REST APIIndexcreationdeletionclosing and openingrefreshingexistence checkingCopyright 2013 Sematext Group. Inc. All rights reserved
  40. 40. Apache Solr Shard SplittingCopyright 2013 Sematext Group. Inc. All rights reservedadmin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
  41. 41. Cluster State MonitoringCopyright 2013 Sematext Group. Inc. All rights reservedMultiple MBeans exposed byJMXMultiple REST end – pointsexposed to get differentstatistics
  42. 42. ElasticSearch Statistics APIHealth and state checkNodes informationCache statisticsSegments informationIndex informationMappings informationCopyright 2013 Sematext Group. Inc. All rights reservedSPM – „One to rule them all”
  43. 43. ElasticSearch Cluster Settings UpdateControlrebalancingrecoveryallocationChange cluster configuration propertiesCopyright 2013 Sematext Group. Inc. All rights reserved
  44. 44. ElasticSearch Custom Shard AllocationCluster level:Index level:curl -XPUT localhost:9200/_cluster/settings -d {"persistent" : {"cluster.routing.allocation.exclude._ip" : "192.168.2.1"}}curl -XPUT localhost:9200/sematext/_settings/ -d {"index.routing.allocation.include.tag" : "nodeOne,nodeTwo"}Copyright 2013 Sematext Group. Inc. All rights reserved
  45. 45. Moving Shards and ReplicasMove shards between nodes on demandcurl -XPOST localhost:9200/_cluster/reroute -d {"commands" : [{"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1","to_node" : "node2"}},{"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}]}Copyright 2013 Sematext Group. Inc. All rights reserved
  46. 46. Copyright 2013 Sematext Group. Inc. All rights reservedThe Verdict
  47. 47. And The Winner Is ?Copyright 2013 Sematext Group. Inc. All rights reserved
  48. 48. We Are Hiring !Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiring world – wide !http://sematext.com/about/jobs.htmlCopyright 2013 Sematext Group. Inc. All rights reserved
  49. 49. Copyright 2013 Sematext Group. Inc. All rights reservedRafał Kuć@kucrafalrafal.kuc@sematext.comSematext@sematexthttp://sematext.comhttp://blog.sematext.comElasticSearch Server 25% off:MREESS25Thank You !
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×