Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ryan Nienhuis, Sr. Product Manager, Amazon Kines...
Agenda
• Data Lake Overview
• Kinesis Firehose Overview
• Demo and Walkthrough
• Q & A
Data Lake
Data Lake Capabilities
• Collecting and storing any type of data, at any scale and at low
costs
• Securing and protecting ...
Data Lake Capabilities
• Collecting and storing any type of data, at any scale and at
low costs
• Securing and protecting ...
Data Lake Storage
Amazon S3
highly scalable and durable object storage
for any type of data, at any scale
and at low costs
Data Lake Ingestion
Kinesis Firehose
Real-time streaming data ETL
for any type of data, at any scale
and at low costs
Time Value of Money Data
Kinesis Firehose
Kinesis Firehose
Firehose
delivery stream destination S3 bucket
backup S3 bucket
source records
data source
source records
transformed
reco...
Streaming ETL to Redshift
intermediate
S3 bucket
backup S3 bucket
source records
data source
source records
Redshift clust...
Streaming ETL to Elasticsearch
Elasticsearch
cluster
backup S3 bucket
source records
data source
source records
Firehose
d...
Demo and Walkthrough
Step 1 Set Up Firehose
Delivery Stream and Configure
Data Transformation
Destination
Configuration
Configuration
Review
Step 2 Send Data to Firehose
Delivery Stream
Sample Data
219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521
"-" "Mozilla/5.0 (compatib...
After Data Transformation
{"host":"26.56.11.130","ident":"-","authuser":"-","request":"GET /wp-content
HTTP/1.1","response...
Send Data
Step 3 Check Results in S3
Step 4 Monitor Streaming Data
Pipeline
Monitor with CloudWatch Metrics
Monitor with CloudWatch Logs
Firehose Pricing
Pricing
Q & A
Thank you!
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Online Tech Talks
Upcoming SlideShare
Loading in …5
×

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Online Tech Talks

4,197 views

Published on

Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose

Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.

Published in: Technology
  • Be the first to comment

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Online Tech Talks

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ryan Nienhuis, Sr. Product Manager, Amazon Kinesis 4/19/2017 Streaming ETL for Data Lakes using Amazon Kinesis Firehose
  2. 2. Agenda • Data Lake Overview • Kinesis Firehose Overview • Demo and Walkthrough • Q & A
  3. 3. Data Lake
  4. 4. Data Lake Capabilities • Collecting and storing any type of data, at any scale and at low costs • Securing and protecting all of data stored in the central repository • Searching and finding the relevant data in the central repository • Quickly and easily performing new types of data analysis on datasets • Querying the data by defining the data’s structure at the time of use (schema on read)
  5. 5. Data Lake Capabilities • Collecting and storing any type of data, at any scale and at low costs • Securing and protecting all of data stored in the central repository • Searching and finding the relevant data in the central repository • Quickly and easily performing new types of data analysis on datasets • Querying the data by defining the data’s structure at the time of use (schema on read)
  6. 6. Data Lake Storage Amazon S3 highly scalable and durable object storage for any type of data, at any scale and at low costs
  7. 7. Data Lake Ingestion Kinesis Firehose Real-time streaming data ETL for any type of data, at any scale and at low costs
  8. 8. Time Value of Money Data
  9. 9. Kinesis Firehose
  10. 10. Kinesis Firehose
  11. 11. Firehose delivery stream destination S3 bucket backup S3 bucket source records data source source records transformed records transformation failure Streaming ETL to S3
  12. 12. Streaming ETL to Redshift intermediate S3 bucket backup S3 bucket source records data source source records Redshift cluster Firehose delivery stream transformed records transformed records transformation failure delivery failure
  13. 13. Streaming ETL to Elasticsearch Elasticsearch cluster backup S3 bucket source records data source source records Firehose delivery stream transformed records delivery failure transformation failure
  14. 14. Demo and Walkthrough
  15. 15. Step 1 Set Up Firehose Delivery Stream and Configure Data Transformation
  16. 16. Destination
  17. 17. Configuration
  18. 18. Configuration
  19. 19. Review
  20. 20. Step 2 Send Data to Firehose Delivery Stream
  21. 21. Sample Data 219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.1; .NET CLR 3.8.23015.5)" 95.169.41.62 - - [16/Feb/2017:09:38:20 -0800] "PUT /app/main/posts HTTP/1.1" 200 3883 "-" "Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko" 221.147.191.247 - - [16/Feb/2017:09:38:20 -0800] "GET /explore HTTP/1.1" 200 6579 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1) AppleWebKit/538.0.1 (KHTML, like Gecko) Chrome/38.0.895.0 Safari/538.0.1" 179.96.123.130 - - [16/Feb/2017:09:38:20 -0800] "GET /list HTTP/1.1" 200 560 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:5.4) Gecko/20100101 Firefox/5.4.6" 132.119.12.76 - - [16/Feb/2017:09:38:20 -0800] "PUT /explore HTTP/1.1" 200 3131 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_0 rv:5.0; AZ) AppleWebKit/535.1.0 (KHTML, like Gecko) Version/4.0.3 Safari/535.1.0" 74.113.56.92 - - [16/Feb/2017:09:38:20 -0800] "DELETE /app/main/posts HTTP/1.1" 200 7069 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_9) AppleWebKit/532.1.0 (KHTML, like Gecko) Chrome/15.0.877.0 Safari/532.1.0"
  22. 22. After Data Transformation {"host":"26.56.11.130","ident":"-","authuser":"-","request":"GET /wp-content HTTP/1.1","response":200,"bytes":4582,"verb":"GET","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"} {"host":"180.153.215.216","ident":"-","authuser":"-","request":"PUT /search/tag/list HTTP/1.1","response":200,"bytes":1461,"verb":"PUT","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"} {"host":"155.233.163.37","ident":"-","authuser":"-","request":"GET /explore HTTP/1.1","response":500,"bytes":326,"verb":"GET","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"} {"host":"189.176.106.5","ident":"-","authuser":"-","request":"POST /search/tag/list HTTP/1.1","response":200,"bytes":3059,"verb":"POST","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"}
  23. 23. Send Data
  24. 24. Step 3 Check Results in S3
  25. 25. Step 4 Monitor Streaming Data Pipeline
  26. 26. Monitor with CloudWatch Metrics
  27. 27. Monitor with CloudWatch Logs
  28. 28. Firehose Pricing
  29. 29. Pricing
  30. 30. Q & A
  31. 31. Thank you!

×