Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters

313 views

Published on

Hokuto Kagaya
LINE / PION C Team

このセッションでは、我々が所有する複数のElasticsearchクラスタをどのように運用しているかについて、発生した、あるいは発生しうる課題とそれに対する改善・解決策をお話しします。

LINE GAMEプラットフォームの製品であるPIONの開発チームでは、ログ用、サービス用などの「用途」および開発環境、サンドボックス環境、本番環境などの「環境」という2つの要素の直積を取る形で複数のElasticsearch(以下Es)クラスタを管理しています。Esは非常に多機能なミドルウェアであるために単一クラスタの場合でもチューニングポイントが多く、管理は簡単ではないですが、複数のクラスタになるとより一層難しくなります。特に、各クラスタに属するノードの管理、インストールされているプラグインの管理などは非常に煩雑で間違えやすくなります。単一クラスタの際はKibanaのベーシックプランやcerebroなどのOSSである程度管理が可能ですが、複数クラスタを持っている場合の最善策は何でしょうか?

また、単一・複数にかかわらず Es クラスタそのものの状態を適切にモニタリングして必要に応じてアラートがほしいという要求は常にあります。Kibanaのmonitoring機能はSearch/Indexそれぞれについてlatencyを見ることができますが、より詳細なモニタリングをするにはクライアント側でmetricsを取っておく必要があります。

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters

  1. 1. 複数のElasticsearchクラスタの運用 で消耗しないために Hokuto Kagaya 開発2センター ゲームプラットフォームサービス開発室 PION C チーム
  2. 2. In-game Community / Marketing Platform WHAT'S PION?
  3. 3. • As a time series DB • As a search engine • As a log store WHAT’S Elasticsearch?
  4. 4. Logging WE HAVE MULTIPLE CLUSTERS FOR.. Event Processi ng Service Develop ment RealSandbox Purpose Environment
  5. 5. “Which clusters did I install which plugins on?” For example.. MULTIPLE CLUSTERS WILL CAUSE.. Basically our clusters are provisioned by Ansible BUT… Someone: “Hey, let’s try the XXX plugin on the node YYY of the cluster ZZZ in DEV environment!” They forgot to record XXX, YYY, ZZZ… Easily go down to chaos!
  6. 6. WE NEED A MANAGEMENT TOOL!
  7. 7. • ElasticHQ (OSS) • Kibana (by Elastic) • cerebro (OSS) EXISTING TOOLS For a single cluster One of its strengths is that it can support multiple clusters OK, let’s use this! However its main purpose is also deep management of a single cluster Not for browsing a cluster list
  8. 8. ANOTHER PROBLEM ON Elasticsearch Not too easy to: monitor an Elasticsearch cluster alert us to the abnormal status based on the result of monitoring properly Kibana or many OSS are very nice, but: Some detailed metrics (like latency 95%ile) cannot retrieved directly We cannot see them when Es is under too heavy load
  9. 9. COMPARISON Multiple clusters? Monitoring? Alerting? Kibana partial support (cross cluster search, dedicated separate cluster for monitoring) partial support (server side metrics) ✔ (with Watcher) ElasticHQ ✔ (not for browsing) partial support (server side metrics) ✘ What we need ✔ (w/ high browsability) ✔ ✔ OK, let’s make it by ourselves!
  10. 10. Screenshots RUBBER BAND - TOOLKIT FOR ES MANAGEMENT
  11. 11. Rubber Band UI, Health Watcher, Client - architecture
  12. 12. Rubber Band UI, Health Watcher, Client - architecture
  13. 13. Rubber Band UI, Health Watcher, Client - architecture
  14. 14. TWO OPTIONS FOR MONITORING Monitor clusters’ states directly /_cat/*** /_cluster/health /_nodes/*** Monitor client-side metrics can compute detailed metrics can access even when a cluster is highly loaded (via our tool)
  15. 15. Rubber Band UI, Health Watcher, Client - architecture
  16. 16. HOW TO ALERT ON A CLUSTER STATUS? The X-Pack GOLD license supports Watcher, which also can be used to check the cluster health out-of-the-box! { "trigger" : { "schedule" : { "interval" : "10s" } }, "input" : { "http" : { "request" : { "host" : "localhost", "port" : 9200, "path" : "/_cluster/health" } } Uses cluster health API! We can also utilize it by ourselves:)
  17. 17. EXAMPLES OF ALERT FROM HEALTH WATCHER
  18. 18. Rubber Band UI, Health Watcher, Client - architecture
  19. 19. MILESTONE PHASE 1 Rubber Band UI Rubber Band Health Watcher Rubber Band Client (Simple REST client wrapper) PHASE 2 • Rubber Band Curator (Centralized wrapper of curator) • Open to the other internal teams PHASE 3 • Publish it as a OSS
  20. 20. KEY TAKEAWAYS How can we manage multiple clusters without any chaos? Our toolkit: Rubber Band A simple UI with information aggregation and appropriate delegation How can we do proper monitoring and alerting? Uses both of direct server states and client metrics Implements a simple health-check server by ourselves And..
  21. 21. WE ARE HIRING!
  22. 22. THANK YOU
  23. 23. @Component public class ElasticsearchClientWrapper { private final RestHighLevelClient elasticsearchClient; private final MeterRegistry meterRegistry; public ElasticsearchClientWrapper(RestHighLevelClient elasticsearchClient, MeterRegistry meterRegistry) { this.elasticsearchClient = elasticsearchClient; this.meterRegistry = meterRegistry; } public void searchAndGetAggregationAsync(SearchRequest searchRequest) { Timer.Sample sample = Timer.start(meterRegistry); elasticsearchClient.searchAsync(searchRequest, new ActionListener<SearchResponse>() { @Override public void onResponse(SearchResponse searchResponse) { sample.stop(meterRegistry.timer("metrics.timer", "success")); // do stuff.. } @Override public void onFailure(Exception e) { sample.stop(meterRegistry.timer("metrics.timer", "failure")); // do fallback.. } }); } Wrap the official HighLevelRESTClient See also: Elasticsearch を検索エンジンとして利用する際のポイント https://engineering.linecorp.com/ja/blog/detail/99

×