Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Log Analytics with Amazon Elasticsearch Service - September Webinar Series

2,751 views

Published on

Elasticsearch is a popular open-source search and analytics engine used for log analytics. With Amazon Elasticsearch Service, you can easily run Elasticsearch on AWS. In this webinar, we will provide an overview of Amazon Elasticsearch Service and demo how to set up and configure an Amazon Elasticsearch domain for the log analytics use case.

Learning Objectives:
'- Understand Amazon Elasticsearch Service use cases and key features
- Learn how to secure your Amazon Elasticsearch cluster for access from Kibana and other plug-ins
- Learn best practices for scaling, monitoring, and troubleshooting Amazon Elasticsearch domains

Published in: Technology
  • Be the first to comment

Log Analytics with Amazon Elasticsearch Service - September Webinar Series

  1. 1. Log Analytics with Amazon Elasticsearch Service Jon Handler (handler@amazon.com)
  2. 2. What we'll cover • Understanding Elasticsearch capabilities • Elasticsearch, the technology • Aggregations; ad-hoc analysis • Amazon Elasticsearch Service is a drop-in replacement for self-managed Elasticsearch • Q&A
  3. 3. Understanding Elasticsearch capabilities
  4. 4. CloudTrail delivers API calls to you • AWS API call monitoring • You need to understand the changing landscape of your AWS resources • You need to do security analysis and compliance auditing • You want the ability to dig into your logs in an intuitive, fine-grained way
  5. 5. How Elasticsearch can help • Combined with Kibana, Elasticsearch provides a tool for search, real-time analytics, and data visualization
  6. 6. Demo Architecture Amazon CloudWatch Logs Amazon Elasticsearch Service CloudTrail Logs AWS Resources
  7. 7. Log lines
  8. 8. Demo
  9. 9. Scenario: Log data analytics • Application monitoring and event diagnosis • You need to monitor the performance of your application, web servers, and hardware • You need easy to use, yet powerful data visualization tools to detect issues in near real-time • You want the ability to dig into your logs in an intuitive, fine-grained way • Kibana provides fast, easy visualization
  10. 10. Scenario: Batch data analytics • Reporting and Analysis • You are a mobile app developer • You have to monitor/manage users across multiple app versions • You want to analyze and report on usage and migration between app versions • Use Kibana for dashboarding. Use the query API for deeper analysis
  11. 11. Scenario: Full-text search • Traditional search • Your application or website provides search capabilities over diverse documents • You are tasked with making this knowledge base searchable and accessible • You need key search features including text matching, faceting, filtering, fuzzy search, auto complete, and highlighting • Use the query API to support application search
  12. 12. Elasticsearch the technology
  13. 13. Elasticsearch is like a database Search Value Field Document Index Cluster Queries Database Value Column Row Table Database SQL
  14. 14. Documents are the core entity ID F1 Value F2 Value { "eventVersion": "1.03", "eventTime": "2016-06-01T00:16:19Z", "eventSource": "dynamodb.amazonaws.com", "eventName": "DescribeStream", "awsRegion": "eu-west-1", "sourceIPAddress": "52.51.24.XX", "userAgent": "leb-kcl-580935a6-5f94-4ce0-ac69-cdeb609ba16a,amazon- kinesis-client-library-java-lambda_1.2.1, aws-internal/3", "requestParameters": { "streamArn": "arn:aws:dynamodb:eu-west- 1:17816119XXXX:table/restaurant/stream/2016-04-08T18:07:53.837" }, "responseElements": null, "requestID": "KC608PH8POAF2I184E2SL1PS2FVV4KQNSO5AEMVJF66Q9ASUAAJG", "eventID": "49b56379-903b-4f04-8ce5-d21bbfcf8ab3", "eventType": "AwsApiCall", "apiVersion": "2012-08-10", "recipientAccountId": "17816119XXXX", "userIdentity": { "type": "AssumedRole", "principalId": "AROAJBQVRM7LN25CAHX7Y:awslambda_338_20160531233813522", "arn": "arn:aws:sts::178161197791:assumed-role/geospatial-rec- engine-ApplicationExecutionRole- 9LPKB77QMR97/awslambda_338_20160531233813522", ...
  15. 15. Lucene provides text analysis and indexing 0 quick 1,3,5 1 brown 2,3,4,6 2 fox 1,7,9 3 lazy 2,8 4 dog 24 Term ID Term Postings Index Writer Index Searcher Segment
  16. 16. Elsaticsearch query processing Query quick brown fox lazy lorem ipsum dolor sit Index Lookup id: 216 id: 305 id: 486 id: 713 Matches Query logic and post- filtering Scoring, aggs id: 713 id: 305 id: 486 id: 216 Sorted matches (results)
  17. 17. Aggregations; ad-hoc analysis
  18. 18. Faceting: basic aggregation • Query: shirt Facets Carhartt (1092)  Russell Athletic (1087) Dickies (954)  RALPH LAUREN (823)  Wrangler (701) Doublju (259)  Levi's (12) ID F1 Value F2 Value
  19. 19. Elasticsearch Aggregations • Buckets – a collection of documents meeting some criterion • Metrics – calculations on the content of buckets. Bucket: time Metric:count
  20. 20. A more complicated aggregation Bucket: ARN Bucket: Region Bucket: eventName Metric: Count
  21. 21. More kinds of aggregations Buckets • Date histogram • Histogram • Range • Terms • Filters • Significant terms Metrics • Count • Average • Sum • Min • Max • Std. Dev • Unique Count • Percentiles
  22. 22. Setting up your cluster
  23. 23. Shard 1 Shard 2 Shard 3 { { { { Shard 4 Shards: independent collections of documents Id Id Id . . . Documents Index/Type
  24. 24. Deployment of indices to a cluster • Index 1 – Shard 1 – Shard 2 – Shard 3 • Index 2 – Shard 1 – Shard 2 – Shard 3 Amazon ES cluster 1 2 3 1 2 3 1 2 3 1 2 3 Primary Replica 1 3 3 1 Instance 1, Master 2 1 1 2 Instance 2 3 2 2 3 Instance 3
  25. 25. Determining storage • Data:Index ratio is typically close to 1:1 • Add a replica, double the storage • Figure out data node count based on storage – Current limits; 10T EBS, 32T instance store
  26. 26. Determining instance type • Instance type is workload-dependent • T2; dev, test, QA • M3; solid performance • R3; heavier queries, aggs • I2; largest storage option
  27. 27. Best practices • Take the minimum number of shards for 50G max data per shard • Number of replicas = 1 • For all prod workloads: use 3 dedicated masters • Use the _bulk API. Some ingest mechanisms do this automatically • Increase index.refresh_interval for higher throughput
  28. 28. Indexing strategy
  29. 29. Logstash REST CWL Agent EC2 Instances Amazon Kinesis Amazon RDS Amazon DynamoDB Amazon SQS Queue Logstash Cluster Amazon Elasticsearch Service Amazon CloudWatch AWS Lambda AWS CloudTrail Access Logs Amazon VPC Flow Logs Amazon S3 bucket AWS IoT Amazon Kinesis Firehose Integration with the AWS ecosystem Amazon ECS
  30. 30. Indexing strategy for streaming data • Use an index per time period, typically index- per-day, high volume can go to index-per-hour • Shard the index according to data size; use 50GB as a soft limit per shard • Master nodes increase cluster stability
  31. 31. Index settings control sharding and more curl -XPUT <endpoint>/<index>/_settings -d '{ "number_of_shards" : 5, "number_of_replicas" : 1, "refresh_interval": "5s" }'
  32. 32. Mappings control how data is indexed curl -XPUT <endpoint>/<index> -d '{ "mappings" : { <type> : { "properties" : { "eventName" : { "type" : "string", "index" : "not_analyzed" } } } } }'
  33. 33. Index templates simplify mapping creation curl -XPUT <endpoint>/_template/<name> -d '{ "template" : "<wildcard e.g. cwl-*>", "settings" : { "number_of_shards" : 2 }, "mappings" : { <type, e.g. _default_> : { "dynamic_templates" : [ { <template name> : { "mapping" : { "index" : "not_analyzed" }, "match" : "*" } } ], "properties" : { "@timestamp" : { "type" : "date" } } } }'
  34. 34. Don't forget the query API!
  35. 35. Direct access to the Elasticsearch API • $ curl -XPUT https://<endpoint>/blog -d '{ • "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }' • $ curl -XPOST http://<endpoint>/blog/post/1 -d '{ • "author":"jon handler", • "title":"Amazon ES Launch" }' • $ curl -XPOST https://<endpoint>/blog/post/_bulk -d ' • { "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}} • {"title":"Amazon ES for search", "author": "carl meadows"}, • { "index" : { "_index":"blog", "_type":"post", "_id":"3" } } • { "title":"Analytics too", "author": "vivek sriram"}' • $ curl -XGET http://<endpoint>/_search?q=ES • {"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0 },"hits":{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type": "post","_id":"1","_score":0.13424811,"_source":{"author":"jon handler", "title":"Amazon ES Launch" }},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{ "title":"Amazon ES for search", "author": "carl meadows"},}]}}
  36. 36. Elasticsearch is a full-featured search engine • Built on Lucene, the popular, open-source library • Search structured and unstructured data with complex, boolean queries • Supports common search features: geo search, aggregations, highlighting, search suggestions, and more
  37. 37. Challenges with self-managed Elasticsearch • Easy to get started, challenging to scale • Scaling ingest pipelines is difficult • Undifferentiated heavy lifting
  38. 38. Amazon Elasticsearch Service
  39. 39. Amazon ES overview Amazon Route 53 Elastic Load Balancing IAM CloudWatch Elasticsearch API CloudTrail
  40. 40. Easy cluster configuration and reconfiguration AWS • Elasticsearch Version • Data nodes, count and type • Master nodes, count and type • Storage option – EBS/instance • HA option • Advanced options
  41. 41. High availability with Zone Awareness Amazon ES cluster 1 3 Instance 1 2 1 2 Instance 2 3 2 1 Instance 3 Availability Zone 1 Availability Zone 2 2 1 Instance 4 3 3
  42. 42. Monitor with CloudWatch metrics • FreeStorageSpace – monitor and alarm before the cluster runs out of space • CPUUtilization – alarm at 80% CPU to signal the need to scale up • ClusterStatus.yellow – check whether replication requires additional nodes • JVMMemoryPressure – check instance type and count for sufficient resources • MasterCPUUtilization – monitoring for master nodes is separated from data nodes
  43. 43. Security with IAM { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam:123456789012:user/susan" }, "Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost", "es:CreateElasticsearchDomain", "es:ListDomainNames" ], "Resource": "arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*" } ] }
  44. 44. Pay for compute and storage you use • With Amazon Elasticsearch Service, you pay only for the compute and storage resources you use. AWS Free Tier for qualifying customers.
  45. 45. Wrap up • Combined with Kibana, Elasticsearch provides search and visualization for streaming data and full-text use cases. • Elasticsearch is based on Lucene, which reads and writes search indices • Aggregations allow you to analyze your data, splitting into Buckets and computing Metrics • Amazon Elasticsearch Service makes it easy to set up and manage your Elasticsearch cluster on AWS • Amazon ES is a great way to get started with Elasticsearch!
  46. 46. Q&A • Jon Handler: handler@amazon.com • Vivek Sriram: Business Development Manager: vsriram@amazon.com • https://run.qwiklab.com/searches/elasticsearch

×