Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks


Published on

Log analytics is a common big data use case that allows you to analyze log data from websites, mobile devices, servers, sensors, and more for a wide variety of applications including digital marketing, application monitoring, fraud detection, ad tech, gaming, and IoT. In this tech talk, we will walk you step-by-step through the process of building an end-to-end analytics solution that ingests, transforms, and loads streaming data using Amazon Kinesis Firehose, Amazon Kinesis Analytics and AWS Lambda. The processed data will be saved to an Amazon Elasticsearch Service cluster, and we will use Kibana to visualize the data in near real-time.

Learning Objectives:
1. Reference architecture for building a complete log analytics solution
2. Overview of the services used and how they fit together
3. Best practices for log analytics implementation

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 2017 AWS Online Tech Talks

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log Analytics with Amazon Kinesis and Amazon Elasticsearch Service
  2. 2. What to do with a terabyte of logs?
  3. 3. data source Amazon Kinesis Firehose Amazon Elasticsearch Service Kibana Log analytics architecture
  4. 4. Amazon Elasticsearch Service is a cost-effective managed service that makes it easy to deploy, manage, and scale open source Elasticsearch for log analytics, full-text search and more. Amazon Elasticsearch Service
  5. 5. Amazon Elasticsearch Service benefits Easy to use Open-source compatible Secure Highly available AWS integrated Scalable
  6. 6. Adobe Developer Platform (Adobe I/O) P R O B L E M • Cost effective monitor for XL amount of log data • Over 200,000 API calls per second at peak - destinations, response times, bandwidth • Integrate seamlessly with other components of AWS eco-system. SOLU TION • Log data is routed with Amazon Kinesis to Amazon Elasticsearch Service, then displayed using AES Kibana • Adobe team can easily see traffic patterns and error rates, quickly identifying anomalies and potential challenges B E N E F I T S • Management and operational simplicity • Flexibility to try out different cluster config during dev and test Amazon Kinesis Streams Spark Streaming Amazon Elasticsearch Service Data Sources 1
  7. 7. McGraw Hill Education P R O B L E M • Supporting a wide catalog across multiple services in multiple jurisdictions • Over 100 million learning events each month • Tests, quizzes, learning modules begun / completed / abandoned S O L U T I O N • Search and analyze test results, student/teacher interaction, teacher effectiveness, student progress • Analytics of applications and infrastructure are now integrated to understand operations in real time B E N E F I T S • Confidence to scale throughout the school year. From 0 to 32TB in 9 months • Focus on their business, not their infrastructure
  8. 8. Get set up right
  9. 9. Amazon ES overview Amazon Route 53 Elastic Load Balancing IAM CloudWatch Elasticsearch API CloudTrail
  10. 10. Data pattern Amazon ES cluster logs_01.21.2017 logs_01.22.2017 logs_01.23.2017 logs_01.24.2017 logs_01.25.2017 logs_01.26.2017 logs_01.27.2017 Shard 1 Shard 2 Shard 3 host ident auth timestamp etc. Each index has multiple shards Each shard contains a set of documents Each document contains a set of fields and values One index per day
  11. 11. Deployment of indices to a cluster • Index 1 – Shard 1 – Shard 2 – Shard 3 • Index 2 – Shard 1 – Shard 2 – Shard 3 Amazon ES cluster 1 2 3 1 2 3 1 2 3 1 2 3 Primary Replica 1 3 3 1 Instance 1, Master 2 1 1 2 Instance 2 3 2 2 3 Instance 3
  12. 12. How many instances? The index size will be about the same as the corpus of source documents • Double this if you are deploying an index replica Size based on storage requirements • Either local storage or up to 1.5TB of EBS per instance • Example: 2TB corpus will need 4 instances – Assuming a replica and using EBS – Or with i2.2xlarge nodes (1.6TB ephemeral storage)
  13. 13. Instance type recommendations Instance Workload T2 Entry point. Dev and test. M3, M4 Equal read and write volumes. R3, R4 Read-heavy or workloads with high memory demands (e.g., aggregations). C4 High concurrency/indexing workloads I2 Up to 1.6 TB of SSD instance storage.
  14. 14. Cluster with no dedicated masters Amazon ES cluster 1 3 3 1 Instance 1, Master 2 1 1 2 Instance 2 3 2 2 3 Instance 3
  15. 15. Cluster with dedicated masters Amazon ES cluster 1 3 3 1 Instance 1 2 1 1 2 Instance 2 3 2 2 3 Instance 3Dedicated master nodes Data nodes: queries and updates
  16. 16. Master node selection • < 10 nodes - m3.medium, c4.large • 11-20 nodes - m4.large, r4.large, m3.large, r3.large • 21-40 nodes - c4.xlarge, m4.xlarge, r4.xlarge, m3.xlarge
  17. 17. Cluster with zone awareness Amazon ES cluster 1 3 Instance 1 2 1 2 Instance 2 3 2 1 Instance 3 Availability Zone 1 Availability Zone 2 2 1 Instance 4 3 3
  18. 18. Small use cases • Logstash co-located on the Application instance • SigV4 signing via provided output plugin • Up to 200GB of data • m3.medium + 100G EBS data nodes • 3x m3.medium master nodes Application Instance
  19. 19. Large use cases Amazon DynamoDB AWS Lambda Amazon S3 bucket Amazon CloudWatch • Data flows from instances and applications via Lambda; CWL is implicit • SigV4 signing via Lambda/roles • Up to 5TB of data • r3.2xlarge + 512GB EBS data nodes • 3x m3.medium master nodes
  20. 20. XL use cases Amazon Kinesis • Ingest supported through high-volume technologies like Spark or Kinesis • Up to 60 TB of data • R3.8xlarge + 640GB data nodes • 3x m3.xlarge master nodes Amazon EMR
  21. 21. Best practices Data nodes = Storage needed/Storage per node Use GP2 EBS volumes Use 3 dedicated master nodes for production deployments Enable Zone Awareness Set indices.fielddata.cache.size = 40
  22. 22. Amazon Kinesis
  23. 23. Amazon Kinesis: Streaming Data Made Easy Services make it easy to capture, deliver, process streams on AWS Amazon Kinesis Streams Amazon Kinesis Analytics Amazon Kinesis Firehose
  24. 24. Amazon Kinesis Streams • Easy administration • Build real time applications with framework of choice • Low cost
  25. 25. Amazon Kinesis Firehose • Zero administration • Direct-to-data store integration • Seamless elasticity
  26. 26. Amazon Kinesis Analytics • Interact with streaming data in real-time using SQL • Build fully managed and elastic stream processing applications that process data for real-time visualizations and alarms
  27. 27. Amazon Kinesis - Firehose vs. Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a data latency of 60 seconds or higher.
  28. 28. Kinesis Firehose overview Delivery Stream: Underlying AWS resource Destination: Amazon ES, Amazon Redshift, or Amazon S3 Record: Put records in streams to deliver to destinations
  29. 29. Kinesis Firehose Data Transformation • Firehose buffers up to 3MB of ingested data • When buffer is full, automatically invokes Lambda function, passing array of records to be processed • Lambda function processes and returns array of transformed records, with status of each record • Transformed records are saved to configured destination [{" "recordId": "1234", "data": "encoded-data" }, { "recordId": "1235", "data": "encoded-data" } ] [{ "recordId": "1234", "result": "Ok" "data": "encoded-data" }, { "recordId": "1235", "result": "Dropped" "data": "encoded-data" } ]
  30. 30. Kinesis Firehose delivery architecture with transformations S3 bucket source records data source source records Amazon Elasticsearch Service Firehose delivery stream transformed records delivery failure Data transformation function transformation failure
  31. 31. Kinesis Firehose features for ingest Serverless scale Error handling S3 Backup
  32. 32. Best practices Use smaller buffer sizes to increase throughput, but be careful of concurrency Use index rotation based on sizing Default: stream limits: 2,000 transactions/second, 5,000 records/second, and 5 MB/second
  33. 33. Log analysis with aggregations
  34. 34. Amazon ES aggregations Buckets – a collection of documents meeting some criterion Metrics – calculations on the content of buckets Bucket: time Metric:count
  35. 35. host: with <histogram of verb> 1, 4, 8, 12, 30, 42, 58, 100 ... Look up Field data GET GET POST GET PUT GET GET POST Buckets GET POST PUT 5 2 1 Counts
  36. 36. A more complicated aggregation Bucket: ARN Bucket: Region Bucket: eventName Metric: Count
  37. 37. Best practices Make sure that your fields are not_analyzed Visualizations are based on buckets/metrics Use a histogram on the x-axis first, then sub-aggregate
  38. 38. Run Elasticsearch in the AWS cloud with Amazon Elasticsearch Service Use Kinesis Firehose to ingest data simply Kibana for monitoring, Elasticsearch queries for deeper analysisAmazon Elasticsearch Service
  39. 39. What to do next Qwiklab: %20to%20amazon%20elasticsearch%20service Centralized logging solution logging/ Our overview page on AWS
  40. 40. Q&A Thank you for joining!