Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Elasticsearch 5 and Bust (RubyConf 2019)

168 views

Published on

Breaking stuff is part of being a developer, but that never makes it any easier when it happens to you. The Elasticsearch outage of 2017 was the biggest outage our company has ever experienced. We drifted between full-blown downtime and degraded service for almost a week. However, it taught us a lot about how we can better prepare and handle upgrades in the future. It also bonded our team together and highlighted the important role teamwork and leadership plays in high-stress situations. The lessons learned are ones that we will not soon forget. In this talk, I will share those lessons and our story in hopes that others can learn from our experiences and be better prepared when they execute their next big upgrade.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Elasticsearch 5 and Bust (RubyConf 2019)

  1. 1. @molly_struve Elasticsearch 5 or and Bust 1
  2. 2. @molly_struve 2
  3. 3. @molly_struve TL;DR 3
  4. 4. @molly_struve 4
  5. 5. @molly_struve 5
  6. 6. @molly_struve 6
  7. 7. @molly_struve 7
  8. 8. @molly_struve 8
  9. 9. @molly_struve 9
  10. 10. @molly_struve Elasticsearch Lingo 10
  11. 11. @molly_struve 11 Elasticsearch ES
  12. 12. @molly_struve 12 Node NodeNode Servers
  13. 13. @molly_struve 13 Node NodeNode Cluster
  14. 14. @molly_struve 14 2.x 5.x
  15. 15. @molly_struve 15 The Story
  16. 16. @molly_struve 16 2017 March
  17. 17. @molly_struve 17 Cluster
  18. 18. @molly_struve 18
  19. 19. @molly_struve 19
  20. 20. @molly_struve Upgrade Steps 20 1 2 3 4 Shutdown the cluster
  21. 21. @molly_struve Upgrade Steps 21 1 2 3 4 Shutdown the cluster Upgrade Elasticsearch on all nodes
  22. 22. @molly_struve Upgrade Steps 22 1 2 3 4 Shutdown the cluster Upgrade Elasticsearch on all nodes Deploy Elasticsearch 5 code changes
  23. 23. @molly_struve Upgrade Steps 23 1 2 3 4 Shutdown the cluster Upgrade Elasticsearch on all nodes Deploy Elasticsearch 5 code changes Start Elasticsearch on all nodes πŸ‘
  24. 24. @molly_struve 24 Elasticsearch 5.x Cluster cpu load cpu load cpu load cpu load cpu load cpu load
  25. 25. @molly_struve 25 Elasticsearch 5.x Cluster cpu load cpu load cpu load cpu load cpu load cpu load 😬 😬 😬
  26. 26. @molly_struve 26
  27. 27. @molly_struve 27 x
  28. 28. @molly_struve 28 Elasticsearch 5.x Cluster ☠ ☠ ☠ ☠ ☠
  29. 29. @molly_struve 29 Elasticsearch 5.x Cluster
  30. 30. @molly_struve 30 Elasticsearch 5.x Cluster cpu load cpu load cpu load cpu load cpu load cpu load
  31. 31. @molly_struve 31 Elasticsearch 5.x Cluster ☠ ☠ ☠ ☠ ☠
  32. 32. @molly_struve 32 Elasticsearch 5.x Cluster ☠
  33. 33. @molly_struve 33 Debugging Mode
  34. 34. @molly_struve 34
  35. 35. @molly_struve 35 Elasticsearch 5 upgrade followed by cluster crash
  36. 36. @molly_struve 36 Elasticsearch 5 cluster instability
  37. 37. @molly_struve 37 Why does Elasticsearch 5 suck so much?
  38. 38. @molly_struve 38
  39. 39. @molly_struve The Story 39
  40. 40. @molly_struve 40
  41. 41. @molly_struve 41 x x
  42. 42. @molly_struve 42
  43. 43. @molly_struve 43 😩 15+ hours 😩
  44. 44. @molly_struve 44
  45. 45. @molly_struve 45
  46. 46. @molly_struve 46 Rollback
  47. 47. @molly_struve 47
  48. 48. @molly_struve 48 Rollback 2.x5.x
  49. 49. @molly_struve 49 Rollback 5 days 2.x5.x
  50. 50. @molly_struve 50 x x x x x x
  51. 51. @molly_struve 51
  52. 52. @molly_struve 52
  53. 53. @molly_struve 53 πŸ™Œ
  54. 54. @molly_struve 54 Workaround Deployed
  55. 55. @molly_struve 55 Workaround Deployed
  56. 56. @molly_struve 56
  57. 57. @molly_struve Lessons Learned 57
  58. 58. @molly_struve 58 1 2 3 4 Have a Rollback Plan 5 6 Lessons Learned
  59. 59. @molly_struve Rollback Plan Can you rollback the software inline? 59
  60. 60. @molly_struve Rollback Plan Can you rollback the software inline? 60 How long and hard will a rollback be?
  61. 61. @molly_struve Rollback Plan Can you rollback the software inline? 61 How long and hard will a rollback be? Worst case scenario the shit out of the upgrade
  62. 62. @molly_struve 62 1 2 3 4 Have a Rollback Plan 5 Do Performance Testing 6 Lessons Learned
  63. 63. @molly_struve 63 The last upgrade was great, this one will be too!
  64. 64. @molly_struve 64 The last upgrade was great, this one will be too!
  65. 65. @molly_struve 65 Performance Test
  66. 66. @molly_struve 66 Performance Test
  67. 67. @molly_struve 67 1 2 3 4 Have a Rollback Plan 5 Do Performance Testing Don't Ignore Small Warning Signs 6 Lessons Learned
  68. 68. @molly_struve 68 Node Local Elasticsearch ☠
  69. 69. @molly_struve 69 Don’t ignore small warning signs
  70. 70. @molly_struve 70
  71. 71. @molly_struve 71 1 2 3 4 Have a Rollback Plan 5 Do Performance Testing Don't Ignore Small Warning Signs 6 Lessons Learned
  72. 72. @molly_struve 72 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs 6 Lessons Learned
  73. 73. @molly_struve 73 Community
  74. 74. @molly_struve 74
  75. 75. @molly_struve 75 Don’t Wait
  76. 76. @molly_struve 76 Ask!
  77. 77. @molly_struve 77 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned
  78. 78. @molly_struve 78 Engineers
  79. 79. @molly_struve 79 Vice President of Engineering
  80. 80. @molly_struve 80 Vice President of Engineering
  81. 81. @molly_struve 81 Vice President of Engineering πŸ›‘
  82. 82. @molly_struve 82 Trust
  83. 83. @molly_struve 83 Fail Forward
  84. 84. @molly_struve 84 VP’s
  85. 85. @molly_struve 85 VP’s Managers
  86. 86. @molly_struve 86 VP’s Managers C Suite Execs
  87. 87. @molly_struve 87
  88. 88. @molly_struve 88 πŸ›‘
  89. 89. @molly_struve 89 Trust
  90. 90. @molly_struve 90 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  91. 91. @molly_struve 91 Developer/Engineer πŸ–₯
  92. 92. @molly_struve 92 Developer/Engineer πŸ–₯ 😊
  93. 93. @molly_struve 93 😊😊😊
  94. 94. @molly_struve 94 😱😳😩
  95. 95. @molly_struve 95 😱😳😩 😐😟😬 πŸ˜­πŸ˜‘πŸ€”
  96. 96. @molly_struve 96 Character is everything
  97. 97. @molly_struve 97 Character is everything
  98. 98. @molly_struve 98 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  99. 99. @molly_struve 99 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  100. 100. @molly_struve 100 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  101. 101. @molly_struve 101 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  102. 102. @molly_struve 102 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  103. 103. @molly_struve 103 Elasticsearch Outage 2017
  104. 104. @molly_struve 104 Embrace Your Mistakes
  105. 105. @molly_struve 105 Embrace Your Mistakes
  106. 106. @molly_struve 106 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  107. 107. @molly_struve 107 Embrace it
  108. 108. @molly_struve 108 Learn from it Embrace it
  109. 109. @molly_struve 109 Learn from it Share it with others Embrace it
  110. 110. @molly_struve 110 Questions?

Γ—