Battle of the Giants round 2
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Battle of the Giants round 2

  • 2,491 views
Uploaded on

Second round of the "Battle of the Giants" talk that was given at Berlin Buzzwords 2013.

Second round of the "Battle of the Giants" talk that was given at Berlin Buzzwords 2013.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,491
On Slideshare
2,465
From Embeds
26
Number of Embeds
1

Actions

Shares
Downloads
53
Comments
0
Likes
8

Embeds 26

https://twitter.com 26

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Battle of the GiantsRafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com
  • 2. Ich bin ein…Sematext consultant & engineerSolr Cookbook series author„ElasticSearch Server” author„Mastering ElasticSearch” authorSolr.pl co-founderFather and husband Copyright 2013 Sematext Group. Inc. All rights reserved
  • 3. Copyright 2013 Sematext Group. Inc. All rights reserved
  • 4. Under the HoodCopyright 2013 Sematext Group. Inc. All rights reservedLucene 4.3Lucene 4.3
  • 5. ExpectationsScalabilityFault tolerananceHigh availablityFeaturesManageabilityEase of installationToolsSupportCopyright 2013 Sematext Group. Inc. All rights reserved
  • 6. Expectations vs RealityOnly ElasticSearch nodesSingle leaderCopyright 2013 Sematext Group. Inc. All rights reservedSolr + ZooKeeperLeader per shardDistributedFault tolerantAutomatic leader election
  • 7. All Time Top CommittersCopyright 2013 Sematext Group. Inc. All rights reserved
  • 8. Active ContributorsCopyright 2013 Sematext Group. Inc. All rights reserved
  • 9. The CodeCopyright 2013 Sematext Group. Inc. All rights reserved
  • 10. The Mailing ListsCopyright 2013 Sematext Group. Inc. All rights reserved
  • 11. TrendsCopyright 2013 Sematext Group. Inc. All rights reserved
  • 12. Collection vs IndexCollections and Indices can be spread amongdifferent nodes in the clusterCopyright 2013 Sematext Group. Inc. All rights reservedCollection – mainlogical indexIndex – mainlogical structure
  • 13. Apache Solr Index StructureField and types defined in schemaAutomatic value copyingDynamic fieldsCustom similarityCustom postings formatMultiple document types require shared schemaCan be read using APICopyright 2013 Sematext Group. Inc. All rights reserved
  • 14. ElasticSearch Index StructureSchema - lessFields and types defined with HTTP APIMulti – field supportNested and parent – child documentsCustom similarityCustom postings formatMultiple document with different structureCan be read and written using APICopyright 2013 Sematext Group. Inc. All rights reserved
  • 15. Shards and ReplicasMany shards0 or more replicasReplica can become leaderReplicas can be created onlive clusterCopyright 2013 Sematext Group. Inc. All rights reserved
  • 16. ConfigurationStatic in solrconfig.xmlCan be reloaded withcore reloadStatic in elasticsearch.ymlChangable at runtimeCopyright 2013 Sematext Group. Inc. All rights reserved
  • 17. DiscoveryCopyright 2013 Sematext Group. Inc. All rights reservedZen DiscoveryApache Zookeeper
  • 18. Solr & ZooKeeperRequires additional softwarePrevents split – brain situationsHolds collections configurationsZooKeeper ensemble neededCopyright 2013 Sematext Group. Inc. All rights reserved
  • 19. ElasticSearch Zen DiscoveryAutomatic node discoveryMulticast and unicast discovery methodsAutomatic master detectionTwo - way failure detectionCopyright 2013 Sematext Group. Inc. All rights reserved
  • 20. HTTP FTWHTTP REST API in ElasticSearch or Query Stringfor simple queriesHTTP with Query String in Apache SolrBoth provide specialized Java APICopyright 2013 Sematext Group. Inc. All rights reserved
  • 21. Results GroupingGroup on:field valuequery resultfunction queryCopyright 2013 Sematext Group. Inc. All rights reserved
  • 22. Prospective SearchCalled PercolatorMatches documents to stored queriesCopyright 2013 Sematext Group. Inc. All rights reserved
  • 23. Full Text Search CapabilitiesVariety of queriesControl score calculationDifferent query parsersAdvanced Lucene queriesCopyright 2013 Sematext Group. Inc. All rights reserved
  • 24. Score CalculationLeverage Lucene scoringControl importance of:documentsqueriestermsphrasesSimiliarity configurationCopyright 2013 Sematext Group. Inc. All rights reserved
  • 25. Apache Solr and Score InfluenceIndex - time boostingQuery - timeTerm boostsField boostsPhrases boostFunction queriesSub-queries used for boostingCopyright 2013 Sematext Group. Inc. All rights reserved
  • 26. ElasticSearch and Score InfluenceIndex - timeQuery - timeDifferent queries provide different boost controlsCan calculate distributed term frequenciesNegative and Positive boosting queriesCustom score filtersScriptsCopyright 2013 Sematext Group. Inc. All rights reserved
  • 27. ElasticSearch Query RescoreReorders top N hits by using other queryExecuted on shards before results are returnedto the node handling itNot executed with scan and countCopyright 2013 Sematext Group. Inc. All rights reserved
  • 28. ElasticSearch Nested ObjectsIndexed as separate documentsStored in the same part of index as root docHidden from standard queries and filtersNeed appropriate queries and filters (nested)Top level documents can be sorted on the basisof nested onesCopyright 2013 Sematext Group. Inc. All rights reserved
  • 29. Solr Parent – Child RelationshipUsed at query timeMulti core joins possibleselect?q={!join from=parent to=id}color:YellowCopyright 2013 Sematext Group. Inc. All rights reserved
  • 30. ElasticSearch Parent – ChildProper indexing requiredIndexed as separate documentsStandard queries don’t return child documentsRetrieve parent docs using queries and filters(has_child, has_parent, top_children)Copyright 2013 Sematext Group. Inc. All rights reserved
  • 31. FiltersUsed to narrown down query resultsGood candidates for caching and reuseCopyright 2013 Sematext Group. Inc. All rights reservedAddictiveCan use different query parsersCan use local paramsNarrows down faceting resultsDefined using Query DSLCan be used for score calculationDoesn’t narrow down facetingresults
  • 32. FacetingCopyright 2013 Sematext Group. Inc. All rights reservedTermsRange & queryTerms statisticsSpatial distancePivot Histograms
  • 33. Real Time Or Not ?Get not yet indexed docs from transaction logDon’t need searcher reopeningCopyright 2013 Sematext Group. Inc. All rights reservedSeparate Get andMulti Get APISeparate Realtime GetHandler
  • 34. Data HandlingSingle and batch indexing supportedCopyright 2013 Sematext Group. Inc. All rights reservedJSON in / JSON out(and YAML)Different formats allowed(XML, JSON, CSV, binary)
  • 35. Partial Document UpdatesNot based on LUCENE-3837Server-side doc reindexingBoth servers use versioningDecreases network trafficCopyright 2013 Sematext Group. Inc. All rights reserved
  • 36. Apache Solr Partial Doc UpdateSent to the standard update handlerRequires _version_ fieldcurl localhost:8983/solr/update?commit=true -HContent-type:application/json -d [ {"id" : "12345","enabled" : {"set" : true}} ]Copyright 2013 Sematext Group. Inc. All rights reserved
  • 37. ElasticSearch Partial Doc UpdateSpecial end – point exposed - _updateSupports parameters like routing, parent,replication, percolate, etc (similar to Index API)Uses scripts to perform document updatescurl -XPOST localhost:9200/sematext/test/12345/_update -d {"script" : "ctx._source.enabled = enabled","params" : {"enabled" : true}}Copyright 2013 Sematext Group. Inc. All rights reserved
  • 38. Solr Collections APICollectioncreationreloaddeletionshards splittingCopyright 2013 Sematext Group. Inc. All rights reserved
  • 39. ElasticSearch Indices REST APIIndexcreationdeletionclosing and openingrefreshingexistence checkingCopyright 2013 Sematext Group. Inc. All rights reserved
  • 40. Apache Solr Shard SplittingCopyright 2013 Sematext Group. Inc. All rights reservedadmin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
  • 41. Cluster State MonitoringCopyright 2013 Sematext Group. Inc. All rights reservedMultiple MBeans exposed byJMXMultiple REST end – pointsexposed to get differentstatistics
  • 42. ElasticSearch Statistics APIHealth and state checkNodes informationCache statisticsSegments informationIndex informationMappings informationCopyright 2013 Sematext Group. Inc. All rights reservedSPM – „One to rule them all”
  • 43. ElasticSearch Cluster Settings UpdateControlrebalancingrecoveryallocationChange cluster configuration propertiesCopyright 2013 Sematext Group. Inc. All rights reserved
  • 44. ElasticSearch Custom Shard AllocationCluster level:Index level:curl -XPUT localhost:9200/_cluster/settings -d {"persistent" : {"cluster.routing.allocation.exclude._ip" : "192.168.2.1"}}curl -XPUT localhost:9200/sematext/_settings/ -d {"index.routing.allocation.include.tag" : "nodeOne,nodeTwo"}Copyright 2013 Sematext Group. Inc. All rights reserved
  • 45. Moving Shards and ReplicasMove shards between nodes on demandcurl -XPOST localhost:9200/_cluster/reroute -d {"commands" : [{"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1","to_node" : "node2"}},{"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}]}Copyright 2013 Sematext Group. Inc. All rights reserved
  • 46. Copyright 2013 Sematext Group. Inc. All rights reservedThe Verdict
  • 47. And The Winner Is ?Copyright 2013 Sematext Group. Inc. All rights reserved
  • 48. We Are Hiring !Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiring world – wide !http://sematext.com/about/jobs.htmlCopyright 2013 Sematext Group. Inc. All rights reserved
  • 49. Copyright 2013 Sematext Group. Inc. All rights reservedRafał Kuć@kucrafalrafal.kuc@sematext.comSematext@sematexthttp://sematext.comhttp://blog.sematext.comElasticSearch Server 25% off:MREESS25Thank You !