AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jon Handler, Principal Solutions Architect
November 29, 2016
Real-Time Data Exploration and
Analytics with Amazon Elasticsearch
Service and Kibana
BDM302

What to do with a terabyte of logs?

What to Expect from the Session
data source Amazon Kinesis Firehose Amazon Elasticsearch
Service
Kibana
123 4
Query DSL
5

Demo: create an Amazon ES
domain

Shard 1 Shard 2 Shard 3 Shard 4
An index is a collection of documents, divided
into shards
Documents
Index
ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID
...
Indexing, compression

Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

How many instances?
The index size will be about the same as the
corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements
• Either local storage or 512GB of Amazon Elastic
Block Store (EBS) per instance
• Example: 2TB corpus will need 8 instances
– Assuming a replica and using EBS
– With i2.2xlarge nodes using 1.6TB ephemeral storage, 4 nodes would
be enough

Cluster with no dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Cluster with dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates

Cluster with zone awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3

Best practices
Data nodes = Storage needed/Storage per node
Use GP2 EBS volumes
Use 3 dedicated master nodes for production deployments
Enable zone awareness
Set indices.fielddata.cache.size = 40

Amazon Elasticsearch Service
overview
Amazon Route
53
Elastic Load
Balancing
AWS IAM
Amazon
CloudWatch
Elasticsearch API
AWS CloudTrail

Amazon Elasticsearch Service benefits
Easy to use
Open-source
compatible
Secure
Highly available
AWS integrated
Scalable

Kinesis Firehose overview
Delivery Stream: Underlying
AWS resource
Destination: Amazon ES,
Amazon Redshift, or Amazon
S3
Record: Put records in
streams to deliver to
destinations

Firehose delivery architecture today
intermediate
Amazon S3 bucket
backup S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
delivery failure

Coming soon! Firehose delivery architecture
with transformations
intermediate
Amazon S3
bucket
backup S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records transformed
records
transformation failure
delivery failure

Kinesis Firehose features for ingest
Serverless scale Error handling S3 Backup

Demo: create a Kinesis
Firehose stream

Best practices
Use smaller buffer sizes to increase throughput, but be
careful of concurrency
Use index rotation based on sizing
Default: stream limits: 2,000 transactions/second, 5,000
records/second, and 5 MB/second

Number of shards = index size/30GB
Define the number of shards
when you create the index
Less is more
Writes occupy 1 shard, reads
occupy all shards
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Mapping controls how data is indexed
not_analyzed text is best for
Kibana visualizations
Define a _template to
apply to all new indexes
The template also defines the
number of shards
0 delete 1,3,5
1 get 2,3,4,6
2 head 1,7,9
3 post 2,8
4 put 24
Index
Writer

Transform log lines to search documents
d104.aa.net - - [01/Jul/1995:00:00:15 -0400] "GET /images/KSC-
logosmall.gif HTTP/1.0" 200 1204
{"status": 200, "ident": "-", "@timestamp": "1995-07-
01T00:00:05", "request": "/images/KSC-logosmall.gif HTTP/1.0",
"auth": "-", "host": "d104.aa.net", "verb": "GET", "time":
"01/Jul/1995:00:00:15 -0400", "size": 1204}

Demo: upload template, send
logs to Firehose

Best practices
• Use a template for settings
• Set number of shards based on 30 GB per shard
• Best case, 1 active shard per node
• For analysis use cases, set not_analyzed on all fields

Amazon ES aggregations
Buckets – a collection of documents meeting some criterion
Metrics – calculations on the content of buckets
Bucket: time
Metric:count

Best practices
Make sure that your fields are not_analyzed
Visualizations are based on buckets/metrics
Use a histogram on the x-axis first, then sub-aggregate

Run Elasticsearch in the AWS Cloud with Amazon
Elasticsearch Service
Use Kinesis Firehose to ingest data simply
Kibana for monitoring, Elasticsearch queries for
deeper analysisAmazon
Elasticsearch
Service

What to do next
Qwiklab:
https://qwiklabs.com/searches/lab?keywords=introduction
%20to%20amazon%20elasticsearch%20service
Centralized logging solution
https://aws.amazon.com/answers/logging/centralized-
logging/
Our overview page on AWS
https://aws.amazon.com/elasticsearch-service/

Remember to complete
your evaluations!

AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)

Similar to AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302) (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana (BDM302)