Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

2,647 views

Published on

Slides from the Chicago AWS user group on May 5th, 2016. Asaf Yigal, Co-Founder and VP Product at Logz.io, presented on using Elasticsearch, Logstash, and Kibana in Amazon Web Services.

"Setting up the increasingly-popular open-source ELK Stack (Elasticsearch, Logstash, and Kibana) on AWS might seem like an easy task, but we have gone through several iterations in our architecture and have made some mistakes in our deployments that have turned out to be common in the industry. In this talk, we will go through what we did and explain what worked and what failed -- and why. We will also provide a complete blueprint of how to set up ELK for production on AWS." ~ @asafyigal

Published in: Technology

Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

  1. 1. AWS Meetup Chicago
  2. 2. Who am I Asaf Yigal Co-Founder and VP Product @logz.io Email: asaf@logz.io Twitter @asafyigal
  3. 3. Agenda • Why do we need Log analytics? • Intro to ELK • What is Logz.io • Installing ELK on your own • Our Architecture • EC2 machine comparison
  4. 4. Why do we need Log analytics?
  5. 5. Werner Vogels AWS CTO “Log Analytics is Fundamental for Building Cloud Applications”
  6. 6. Product Management Business Analysis Customer Success BI Monitoring DevOps IoT Troubleshooting Support QA IT OPPS , ITOA Compliance SecOps SIEM Multiple Use- Cases
  7. 7. Log driven development • Errors, Warnings and exceptions • Metrics • Alerts • Dashboard
  8. 8. Why Open Source
  9. 9. *based on Logz.io research The Market is Dominated by Open Source Solutions Over the past 3 years, the market shifted attention from proprietary to open source ELK Stack, 400,000+ companies Splunk, Sumo Logic, Loggly, - 20,000 companies Graphite has > 1M companies using it
  10. 10. ELK Popularity
  11. 11. Intro to ELK Logstash •Streaming data digestion •Time normalization •Field extraction Elasticsearch •Schema-less search DB •Highly scalable Kibana •Visualization
  12. 12. Open source ELK +/- Simple and beautifulIt’s simple to get started and play with ELK and the UI is just beautiful Open Source The largest user base with a vibrant open source community that supports and improves the product Fast. Very fast. Built on the Elasticsearch search engine, ELK provide blazing quick responses even when searching through millions of documents Hard to Scale Data piles up and organization experience usage bursts. It’s super-complex building elastic ELK deployments that can scale up and down Poor Security Logs include sensitive data and open source ELK offers no real security solution, from authentication to role based access Not Production Ready Building production ready ELK deployment is a great challenge organization face. With hundreds of different configurations and support matrix, making sure it’s always up is difficult
  13. 13. Up and running in minutesSign up in and get insights into your data in minutes Logz.io Enterprise ELK Cloud Service Production ready Predefined and community designed dashboard, visualization and alerts are all bundled and ready to provide insights Infinitely scalable Ship as much data as you want whenever you want Alerts Unique Alerts system proprietary built on top of open source ELK transform the ELK into a proactive system Highly Available Data and entire data ingestion pipeline can sustain downtime in full datacenter without losing data or service Advanced Security 360 degrees security with role based access and multi-layer security
  14. 14. Installing ELK on your own
  15. 15. Prototype • Installing ELK stack on a single server – 1hr • Shipping one type of log – 1hr • Log parsing – 2 hr • Building Kibana Dashboard – 2hr • 6 hours to get a simple Prototype
  16. 16. Turning ELK Production ready
  17. 17. OS Level OptimizationElasticsearch require a lot of OS level optimization in order to run properly. Elasticsearch Shard Allocation Optimizing insert and query times can be tricky and require a lot of attention. Index Management Because deletion is an expensive operation Index management is required for log analytics solutions Zone awareness This is specific for AWS and required to achieve high availability Cluster Topology Elasticsearch clusters require 3 Master nodes, Data nodes and Client nodes. Bulk inserts OptimizationOptimizing insert time and latency
  18. 18. Capacity provisioningNeed to account for log bursts and be able to provision enough capacity. Elasticsearch (2) Archive (DR) Snapshot the data to a different repository for disaster recovery Mapping managementMapping conflicts and sync issues need to be detected and addressed Monitoring Marvell does a good job but require DevOps constant attention Curator Remove or optimize old indices Alias management For better cluster control you need to define and use aliases
  19. 19. Data parsing Extracting values from text messages and enhancing them with geo user agent etc. Logstash High Availability Running logstash in a cluster is not trivial. Scalability Dealing with increase of load on the logstash servers Burst Protection Logs tend to be bursty – A special buffer like Redis, Kafka etc. is required to front logstash Rejection from ElasticsearchElaticsearch rejects about 1% of messages due to mapping issues – This needs to be addressed Configuration managementA special infrastructure need to be in place to allow config changes with no data loss
  20. 20. Security Kibana by default has no protection. User authentication is required to be implemented Kibana High Availability Running Kibana in a cluster for upgrades and high availability. Role based access If you want to restrict access to certain information this capability needs to be developed Alerts Alerts is not part of the open source. Anomaly Detection Basic anomaly detection is missing from the Kibana Pre Canned DashboardsBuilding Dashboards and visualization in Kibana is tricky and require special knowledge
  21. 21. Turning ELK Production ready ~ 4-6 weeks of work
  22. 22. Upgrades Challenging to upgrade – need to be aware of backward compatibility. Maintenance Overall cluster healthMonitor the health of the environment AWS Issues Dealing with AWS stability issues Mapping conflicts Deal with arising mapping conflicts Personnel redundancyNeed to have multiple people with deep knowledge of the stack Capacity increase Provision additional capacity and grow the cluster.
  23. 23. Our Architecture
  24. 24. Ha Proxy Listener Listener Listener Listener Kafka Log Engine S3 Elasticsearch Play server Curator Hot/Cold migration DLQ Alert Engine Kibana Monitoring: ELK, Graphite, Nagios etc. Shard optimizer Log Engine Logstash API Gateway Cluster Protec- tion
  25. 25. Demo
  26. 26. AWS Server Comparison Machine Number TB/Day M1.xlarge 4 0.6 i2.xlarge 4 1 C3.8xlarge 6 1.5 C4.2xlarge + 1TB EBS 3 1.3
  27. 27. We’re Hiring • Technical evangelist • Business Development • Marketing jobs@logz.io
  28. 28. Questions?

×