Successfully reported this slideshow.

Elasticsearch 5 and Bust (RubyConf 2019)

0

Share

Upcoming SlideShare
Data Antipatterns
Data Antipatterns
Loading in …3
×
1 of 110
1 of 110

Elasticsearch 5 and Bust (RubyConf 2019)

0

Share

Download to read offline

Breaking stuff is part of being a developer, but that never makes it any easier when it happens to you. The Elasticsearch outage of 2017 was the biggest outage our company has ever experienced. We drifted between full-blown downtime and degraded service for almost a week. However, it taught us a lot about how we can better prepare and handle upgrades in the future. It also bonded our team together and highlighted the important role teamwork and leadership plays in high-stress situations. The lessons learned are ones that we will not soon forget. In this talk, I will share those lessons and our story in hopes that others can learn from our experiences and be better prepared when they execute their next big upgrade.

Breaking stuff is part of being a developer, but that never makes it any easier when it happens to you. The Elasticsearch outage of 2017 was the biggest outage our company has ever experienced. We drifted between full-blown downtime and degraded service for almost a week. However, it taught us a lot about how we can better prepare and handle upgrades in the future. It also bonded our team together and highlighted the important role teamwork and leadership plays in high-stress situations. The lessons learned are ones that we will not soon forget. In this talk, I will share those lessons and our story in hopes that others can learn from our experiences and be better prepared when they execute their next big upgrade.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Elasticsearch 5 and Bust (RubyConf 2019)

  1. 1. @molly_struve Elasticsearch 5 or and Bust 1
  2. 2. @molly_struve 2
  3. 3. @molly_struve TL;DR 3
  4. 4. @molly_struve 4
  5. 5. @molly_struve 5
  6. 6. @molly_struve 6
  7. 7. @molly_struve 7
  8. 8. @molly_struve 8
  9. 9. @molly_struve 9
  10. 10. @molly_struve Elasticsearch Lingo 10
  11. 11. @molly_struve 11 Elasticsearch ES
  12. 12. @molly_struve 12 Node NodeNode Servers
  13. 13. @molly_struve 13 Node NodeNode Cluster
  14. 14. @molly_struve 14 2.x 5.x
  15. 15. @molly_struve 15 The Story
  16. 16. @molly_struve 16 2017 March
  17. 17. @molly_struve 17 Cluster
  18. 18. @molly_struve 18
  19. 19. @molly_struve 19
  20. 20. @molly_struve Upgrade Steps 20 1 2 3 4 Shutdown the cluster
  21. 21. @molly_struve Upgrade Steps 21 1 2 3 4 Shutdown the cluster Upgrade Elasticsearch on all nodes
  22. 22. @molly_struve Upgrade Steps 22 1 2 3 4 Shutdown the cluster Upgrade Elasticsearch on all nodes Deploy Elasticsearch 5 code changes
  23. 23. @molly_struve Upgrade Steps 23 1 2 3 4 Shutdown the cluster Upgrade Elasticsearch on all nodes Deploy Elasticsearch 5 code changes Start Elasticsearch on all nodes 👍
  24. 24. @molly_struve 24 Elasticsearch 5.x Cluster cpu load cpu load cpu load cpu load cpu load cpu load
  25. 25. @molly_struve 25 Elasticsearch 5.x Cluster cpu load cpu load cpu load cpu load cpu load cpu load 😬 😬 😬
  26. 26. @molly_struve 26
  27. 27. @molly_struve 27 x
  28. 28. @molly_struve 28 Elasticsearch 5.x Cluster ☠ ☠ ☠ ☠ ☠
  29. 29. @molly_struve 29 Elasticsearch 5.x Cluster
  30. 30. @molly_struve 30 Elasticsearch 5.x Cluster cpu load cpu load cpu load cpu load cpu load cpu load
  31. 31. @molly_struve 31 Elasticsearch 5.x Cluster ☠ ☠ ☠ ☠ ☠
  32. 32. @molly_struve 32 Elasticsearch 5.x Cluster ☠
  33. 33. @molly_struve 33 Debugging Mode
  34. 34. @molly_struve 34
  35. 35. @molly_struve 35 Elasticsearch 5 upgrade followed by cluster crash
  36. 36. @molly_struve 36 Elasticsearch 5 cluster instability
  37. 37. @molly_struve 37 Why does Elasticsearch 5 suck so much?
  38. 38. @molly_struve 38
  39. 39. @molly_struve The Story 39
  40. 40. @molly_struve 40
  41. 41. @molly_struve 41 x x
  42. 42. @molly_struve 42
  43. 43. @molly_struve 43 😩 15+ hours 😩
  44. 44. @molly_struve 44
  45. 45. @molly_struve 45
  46. 46. @molly_struve 46 Rollback
  47. 47. @molly_struve 47
  48. 48. @molly_struve 48 Rollback 2.x5.x
  49. 49. @molly_struve 49 Rollback 5 days 2.x5.x
  50. 50. @molly_struve 50 x x x x x x
  51. 51. @molly_struve 51
  52. 52. @molly_struve 52
  53. 53. @molly_struve 53 🙌
  54. 54. @molly_struve 54 Workaround Deployed
  55. 55. @molly_struve 55 Workaround Deployed
  56. 56. @molly_struve 56
  57. 57. @molly_struve Lessons Learned 57
  58. 58. @molly_struve 58 1 2 3 4 Have a Rollback Plan 5 6 Lessons Learned
  59. 59. @molly_struve Rollback Plan Can you rollback the software inline? 59
  60. 60. @molly_struve Rollback Plan Can you rollback the software inline? 60 How long and hard will a rollback be?
  61. 61. @molly_struve Rollback Plan Can you rollback the software inline? 61 How long and hard will a rollback be? Worst case scenario the shit out of the upgrade
  62. 62. @molly_struve 62 1 2 3 4 Have a Rollback Plan 5 Do Performance Testing 6 Lessons Learned
  63. 63. @molly_struve 63 The last upgrade was great, this one will be too!
  64. 64. @molly_struve 64 The last upgrade was great, this one will be too!
  65. 65. @molly_struve 65 Performance Test
  66. 66. @molly_struve 66 Performance Test
  67. 67. @molly_struve 67 1 2 3 4 Have a Rollback Plan 5 Do Performance Testing Don't Ignore Small Warning Signs 6 Lessons Learned
  68. 68. @molly_struve 68 Node Local Elasticsearch ☠
  69. 69. @molly_struve 69 Don’t ignore small warning signs
  70. 70. @molly_struve 70
  71. 71. @molly_struve 71 1 2 3 4 Have a Rollback Plan 5 Do Performance Testing Don't Ignore Small Warning Signs 6 Lessons Learned
  72. 72. @molly_struve 72 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs 6 Lessons Learned
  73. 73. @molly_struve 73 Community
  74. 74. @molly_struve 74
  75. 75. @molly_struve 75 Don’t Wait
  76. 76. @molly_struve 76 Ask!
  77. 77. @molly_struve 77 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned
  78. 78. @molly_struve 78 Engineers
  79. 79. @molly_struve 79 Vice President of Engineering
  80. 80. @molly_struve 80 Vice President of Engineering
  81. 81. @molly_struve 81 Vice President of Engineering 🛡
  82. 82. @molly_struve 82 Trust
  83. 83. @molly_struve 83 Fail Forward
  84. 84. @molly_struve 84 VP’s
  85. 85. @molly_struve 85 VP’s Managers
  86. 86. @molly_struve 86 VP’s Managers C Suite Execs
  87. 87. @molly_struve 87
  88. 88. @molly_struve 88 🛡
  89. 89. @molly_struve 89 Trust
  90. 90. @molly_struve 90 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  91. 91. @molly_struve 91 Developer/Engineer 🖥
  92. 92. @molly_struve 92 Developer/Engineer 🖥 😊
  93. 93. @molly_struve 93 😊😊😊
  94. 94. @molly_struve 94 😱😳😩
  95. 95. @molly_struve 95 😱😳😩 😐😟😬 😭😡🤔
  96. 96. @molly_struve 96 Character is everything
  97. 97. @molly_struve 97 Character is everything
  98. 98. @molly_struve 98 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  99. 99. @molly_struve 99 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  100. 100. @molly_struve 100 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  101. 101. @molly_struve 101 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  102. 102. @molly_struve 102 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  103. 103. @molly_struve 103 Elasticsearch Outage 2017
  104. 104. @molly_struve 104 Embrace Your Mistakes
  105. 105. @molly_struve 105 Embrace Your Mistakes
  106. 106. @molly_struve 106 1 2 3 4 Have a Rollback Plan Use the Community 5 Do Performance Testing Don't Ignore Small Warning Signs Leader and Management Support Are Crucial 6 Lessons Learned Your Team Matters
  107. 107. @molly_struve 107 Embrace it
  108. 108. @molly_struve 108 Learn from it Embrace it
  109. 109. @molly_struve 109 Learn from it Share it with others Embrace it
  110. 110. @molly_struve 110 Questions?

×