Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scaling massive elastic search clusters - Rafał Kuć - Sematext

70,284 views

Published on

Rafał Kuć presentation on "Scaling Massive ElasticSearch Clusters" given during Berlin Buzzwords 2012

Published in: Technology, Business
  • Don't forget another good way of simplifying your writing is using external resources (such as ⇒ www.WritePaper.info ⇐ ). This will definitely make your life more easier
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2Q98JRS ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ http://bit.ly/2Q98JRS ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Scaling massive elastic search clusters - Rafał Kuć - Sematext

  1. 1. Scaling Massive ElasticSearch Clusters Rafał Kuć – Sematext International @kucrafal @sematext sematext.com
  2. 2. Who Am I• „Solr 3.1 Cookbook” author• Sematext software engineer• Solr.pl co-founder• Father and husband :-) Copyright 2012 Sematext Int’l. All rights reserved
  3. 3. What Will I Talk About ?• ElasticSearch scaling• Indexing thousands of documents per second• Performing queries in tens of milliseconds• Controling shard and replica placement• Handling multilingual content• Performance testing• Cluster monitoring Copyright 2012 Sematext Int’l. All rights reserved
  4. 4. The Challenge• More than 50 millions of documents a day• Real time search• Less than 200ms average query latency• Throughput of at least 1000 QPS• Multilingual indexing• Multilingual querying Copyright 2012 Sematext Int’l. All rights reserved
  5. 5. Why ElasticSearch ?• Written with NRT and cloud support in mind• Uses Lucene and all its goodness• Distributed indexing with document distribution control out of the box• Easy index, shard and replicas creation on live cluster Copyright 2012 Sematext Int’l. All rights reserved
  6. 6. Index Design• Several indices (at least one index for each day of data)• Indices divided into multiple shards• Multiple replicas of a single shard• Real-time, synchronous replication• Near-real-time index refresh (1 to 30 seconds) Copyright 2012 Sematext Int’l. All rights reserved
  7. 7. Shard Deployment Problems• Multiple shards per node• Replicas on the same nodes as shards• Not evenly distributed shards and replicas• Some nodes being hot, while others are cold Copyright 2012 Sematext Int’l. All rights reserved
  8. 8. Default Shard Deployment Shard 1 Shard 2 Shard 3 Replica 1 Replica 2Node 1 Node 2 Replica 3 Node 3ElasticSearch Cluster Copyright 2012 Sematext Int’l. All rights reserved
  9. 9. What Can We Do With Shards Then ?• Contol shard placement with node tags: – index.routing.allocation.include.tag – index.routing.allocation.exclude.tag• Control shard placement with nodes IP addresses: – cluster.routing.allocation.include._ip – cluster.routing.allocation.exclude._ip• Specified on index or cluster level• Can be changed on live cluster ! Copyright 2012 Sematext Int’l. All rights reserved
  10. 10. Shard Allocation Examples• Cluster level:curl -XPUT localhost:9200/_cluster/settings -d { "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" }}• Index level:curl -XPUT localhost:9200/sematext/ -d { "index.routing.allocation.include.tag" : "nodeOne,nodeTwo"} Copyright 2012 Sematext Int’l. All rights reserved
  11. 11. Number of Shards Per Node• Allows one to specify number of shards per node• Specified on index level• Can be changed on live indices• Example:curl -XPUT localhost:9200/sematext -d { "index.routing.allocation.total_shards_per_node" : 2} Copyright 2012 Sematext Int’l. All rights reserved
  12. 12. Controlled Shard Deployment Shard 1 Replica 2 Shard 3 Replica 1Node 1 Node 2 Shard 2 Replica 3 Node 3ElasticSearch Cluster Copyright 2012 Sematext Int’l. All rights reserved
  13. 13. Does Routing Matters ?• Controls target shard for each document• Defaults to hash of a document identifier• Can be specified explicitly (routing parameter) or as a field value (a bit less performant)• Can take any value• Example:curl -XPUT localhost:9200/sematext/test/1?routing=1234 -d { "title" : "Test routing document"} Copyright 2012 Sematext Int’l. All rights reserved
  14. 14. Indexing the Data Shard Replica Shard Replica 1 2 3 1 Node 1 Node 2 Shard Replica 2 3 Node 3ElasticSearch Cluster Indexing application Copyright 2012 Sematext Int’l. All rights reserved
  15. 15. How We Indexed Data Shard 1 Shard 2Node 1 Node 2 Node 3ElasticSearch Cluster Indexing application Copyright 2012 Sematext Int’l. All rights reserved
  16. 16. Nodes Without Data• Nodes used only to route data and queries to other nodes in the cluster• Such nodes don’t suffer from I/O waits (of course Data Nodes don’t suffer from I/O waits all the time)• Not default ElasticSearch behavior• Setup by setting node.data to false Copyright 2012 Sematext Int’l. All rights reserved
  17. 17. Multilingual Indexing• Detection of documents language before sending it for indexing• With, e.g. Sematext LangID or Apache Tika• Set known language analyzers in configuration or mappings• Set analyzer during indexing (_analyzer field) Copyright 2012 Sematext Int’l. All rights reserved
  18. 18. Multilingual Indexing Example{ "test" : { "_analyzer" : { "path" : "langId" }, "properties" : { "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" }, "title" : { "type" : "string", "store" : "yes", "index" : "analyzed" }, "langId" : { "type" : "string", "store" : "yes", "index" : "not_analyzed" } } }}curl -XPUT localhost:9200/sematext/test/10 -d { "title" : "Test document", "langId" : "english"} Copyright 2012 Sematext Int’l. All rights reserved
  19. 19. Multilingual Queries• Identify language of query before its execution (can be problematic)• Query analyzer can be specified per query (analyzer parameter): curl -XGET localhost:9200/sematext/_search?q=let+AND+me&analyzer=english Copyright 2012 Sematext Int’l. All rights reserved
  20. 20. Query Performance Factors – Lucene level• Refresh interval – Defaults to 1 second – Can be specified on cluster or index level – curl -XPUT localhost:9200/_settings -d { "index" : { "refresh_interval" : "600s" } }• Merge factor – Defaults to 10 – Can be specified on cluster or index level – curl -XPUT localhost:9200/_settings -d { "index" : { "merge.policy.merge_factor" : 30 } } Copyright 2012 Sematext Int’l. All rights reserved
  21. 21. Let’s Talk About Routing Once Again• Routes a query to a particular shard• Speeds up queries depending on number of shards for a given index• Have to be specified manualy with routing parameter during query• routing parameter can take any value:curl -XGETlocalhost:9200/sematext/_search?q=test&routing=2012-02-16 Copyright 2012 Sematext Int’l. All rights reserved
  22. 22. Querying ElasticSearch – No Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 ElasticSearch Index Application Copyright 2012 Sematext Int’l. All rights reserved
  23. 23. Querying ElasticSearch – With Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 ElasticSearch Index Application Copyright 2012 Sematext Int’l. All rights reserved
  24. 24. Performance Numbers Queries without routing (200 shards, 1 replica)#threads Avg response time Throughput 90% line Median CPU Utilization 1 3169ms 19,0/min 5214ms 2692ms 95 – 99% Queries with routing (200 shards, 1 replica)#threads Avg response time Throughput 90% line Median CPU Utilization 10 196ms 50,6/sec 642ms 29ms 25 – 40% 20 218ms 91,2/sec 718ms 11ms 10 – 15% Copyright 2012 Sematext Int’l. All rights reserved
  25. 25. Scaling Query Throughput – What Else ?• Increasing the number of shards for data distribution• Increasing the number of replicas• Using routing• Avoid always hitting the same node and hotspotting it Copyright 2012 Sematext Int’l. All rights reserved
  26. 26. FieldCache and OutOfMemory• ElasticSearch default setup doesn’t limit field data cache size Copyright 2012 Sematext Int’l. All rights reserved
  27. 27. FieldCache – What We Can do With It ?• Keep its default type and set: – Maximum size (index.cache.field.max_size) – Expiration time (index.cache.field.expire)• Change its type: – soft (index.cache.field.type)• Change your data: – Make your fields less precise (ie: dates) – If you sort or facet on fields think if you can reduce fields granularity• Buy more servers :-) Copyright 2012 Sematext Int’l. All rights reserved
  28. 28. FieldCache After Changes Copyright 2012 Sematext Int’l. All rights reserved
  29. 29. Additional Problems We Encountered• Rebalancing after full cluster restarts – cluster.routing.allocation.disable_allocation – cluster.routing.allocation.disable_replica_allocation• Long startup and initialization• Faceting with strings vs faceting on numbers on high cardinality fields Copyright 2012 Sematext Int’l. All rights reserved
  30. 30. JVM Optimization• Remember to leave enough memory to OS for cache• Make GC frequent ans short vs. rare and long – -XX:+UseParNewGC – -XX:+UseConcMarkSweepGC – -XX:+CMSParallelRemarkEnabled• -XX:+AlwaysPreTouch (for short performance tests) Copyright 2012 Sematext Int’l. All rights reserved
  31. 31. Performance Testing• Data – How much data do I need ? – Choosing the right queries• Make changes – One change at a time – Understand the impact of the change• Monitor your cluster (jstat, dstat/vmstat, SPM)• Analyze your results Copyright 2012 Sematext Int’l. All rights reserved
  32. 32. ElasticSearch Cluster Monitoring• Cluster health• Indexing statistics• Query rate• JVM memory and garbage collector work• Cache usage• Node memory and CPU usage Copyright 2012 Sematext Int’l. All rights reserved
  33. 33. Cluster Health Node restartCopyright 2012 Sematext Int’l. All rights reserved
  34. 34. Indexing Statistics Copyright 2012 Sematext Int’l. All rights reserved
  35. 35. Query RateCopyright 2012 Sematext Int’l. All rights reserved
  36. 36. JVM Memory and GC Copyright 2012 Sematext Int’l. All rights reserved
  37. 37. Cache UsageCopyright 2012 Sematext Int’l. All rights reserved
  38. 38. CPU and Memory Copyright 2012 Sematext Int’l. All rights reserved
  39. 39. Summary• Controlling shard and replica placement• Indexing and querying multilingual data• How to use sharding and routing and not to tear your hair out• How to test your cluster performance to find bottle-necks• How to monitor your cluster and find problems right away Copyright 2012 Sematext Int’l. All rights reserved
  40. 40. We Are Hiring !• Dig Search ?• Dig Analytics ?• Dig Big Data ?• Dig Performance ?• Dig working with and in open – source ?• We’re hiring world – wide ! http://sematext.com/about/jobs.html Copyright 2012 Sematext Int’l. All rights reserved
  41. 41. How to Reach Us• Rafał Kuć – Twitter: @kucrafal – E-mail: rafal.kuc@sematext.com• Sematext – Twitter: @sematext – Website: http://sematext.com• Graphs used in the presentation are from: – SPM for ElasticSearch (http://sematext.com/spm) Copyright 2012 Sematext Int’l. All rights reserved
  42. 42. Thank You For Your Attention

×