SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ryan Nienhuis, Sr. Product Manager, Amazon Kinesis
4/19/2017
Streaming ETL for Data
Lakes using Amazon Kinesis
Firehose
Agenda
• Data Lake Overview
• Kinesis Firehose Overview
• Demo and Walkthrough
• Q & A
Data Lake
Data Lake Capabilities
• Collecting and storing any type of data, at any scale and at low
costs
• Securing and protecting all of data stored in the central repository
• Searching and finding the relevant data in the central repository
• Quickly and easily performing new types of data analysis on
datasets
• Querying the data by defining the data’s structure at the time of
use (schema on read)
Data Lake Capabilities
• Collecting and storing any type of data, at any scale and at
low costs
• Securing and protecting all of data stored in the central repository
• Searching and finding the relevant data in the central repository
• Quickly and easily performing new types of data analysis on
datasets
• Querying the data by defining the data’s structure at the time of
use (schema on read)
Data Lake Storage
Amazon S3
highly scalable and durable object storage
for any type of data, at any scale
and at low costs
Data Lake Ingestion
Kinesis Firehose
Real-time streaming data ETL
for any type of data, at any scale
and at low costs
Time Value of Money Data
Kinesis Firehose
Kinesis Firehose
Firehose
delivery stream destination S3 bucket
backup S3 bucket
source records
data source
source records
transformed
records
transformation failure
Streaming ETL to S3
Streaming ETL to Redshift
intermediate
S3 bucket
backup S3 bucket
source records
data source
source records
Redshift cluster
Firehose
delivery stream
transformed
records transformed
records
transformation failure
delivery failure
Streaming ETL to Elasticsearch
Elasticsearch
cluster
backup S3 bucket
source records
data source
source records
Firehose
delivery stream
transformed
records
delivery failure
transformation failure
Demo and Walkthrough
Step 1 Set Up Firehose
Delivery Stream and Configure
Data Transformation
Destination
Configuration
Configuration
Review
Step 2 Send Data to Firehose
Delivery Stream
Sample Data
219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521
"-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.1; .NET CLR
3.8.23015.5)"
95.169.41.62 - - [16/Feb/2017:09:38:20 -0800] "PUT /app/main/posts HTTP/1.1" 200
3883 "-" "Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko"
221.147.191.247 - - [16/Feb/2017:09:38:20 -0800] "GET /explore HTTP/1.1" 200 6579 "-"
"Mozilla/5.0 (Windows; U; Windows NT 5.1) AppleWebKit/538.0.1 (KHTML, like Gecko)
Chrome/38.0.895.0 Safari/538.0.1"
179.96.123.130 - - [16/Feb/2017:09:38:20 -0800] "GET /list HTTP/1.1" 200 560 "-"
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:5.4) Gecko/20100101 Firefox/5.4.6"
132.119.12.76 - - [16/Feb/2017:09:38:20 -0800] "PUT /explore HTTP/1.1" 200 3131 "-"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_0 rv:5.0; AZ) AppleWebKit/535.1.0
(KHTML, like Gecko) Version/4.0.3 Safari/535.1.0"
74.113.56.92 - - [16/Feb/2017:09:38:20 -0800] "DELETE /app/main/posts HTTP/1.1" 200
7069 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_9) AppleWebKit/532.1.0
(KHTML, like Gecko) Chrome/15.0.877.0 Safari/532.1.0"
After Data Transformation
{"host":"26.56.11.130","ident":"-","authuser":"-","request":"GET /wp-content
HTTP/1.1","response":200,"bytes":4582,"verb":"GET","@timestamp":"2017-04-
04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"}
{"host":"180.153.215.216","ident":"-","authuser":"-","request":"PUT /search/tag/list
HTTP/1.1","response":200,"bytes":1461,"verb":"PUT","@timestamp":"2017-04-
04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"}
{"host":"155.233.163.37","ident":"-","authuser":"-","request":"GET /explore
HTTP/1.1","response":500,"bytes":326,"verb":"GET","@timestamp":"2017-04-
04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"}
{"host":"189.176.106.5","ident":"-","authuser":"-","request":"POST /search/tag/list
HTTP/1.1","response":200,"bytes":3059,"verb":"POST","@timestamp":"2017-04-
04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"}
Send Data
Step 3 Check Results in S3
Step 4 Monitor Streaming Data
Pipeline
Monitor with CloudWatch Metrics
Monitor with CloudWatch Logs
Firehose Pricing
Pricing
Q & A
Thank you!

More Related Content

What's hot

Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
Amazon Web Services
 
Challenges in building a Data Pipeline
Challenges in building a Data PipelineChallenges in building a Data Pipeline
Challenges in building a Data Pipeline
Hevo Data Inc.
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
Amazon Web Services
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
Clickstream Analysis with Apache Spark
Clickstream Analysis with Apache SparkClickstream Analysis with Apache Spark
Clickstream Analysis with Apache Spark
QAware GmbH
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearch
NETWAYS
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
Vikrant Chauhan
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
Danny Yuan
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon Web Services
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
Intro to the Alfresco Public API
Intro to the Alfresco Public APIIntro to the Alfresco Public API
Intro to the Alfresco Public API
Jeff Potts
 
A Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudA Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudDeepak Rao
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
Amazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Amazon Web Services
 
Amazon EBS: Deep Dive
Amazon EBS: Deep DiveAmazon EBS: Deep Dive
Amazon EBS: Deep Dive
Amazon Web Services
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 

What's hot (20)

Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Challenges in building a Data Pipeline
Challenges in building a Data PipelineChallenges in building a Data Pipeline
Challenges in building a Data Pipeline
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
Clickstream Analysis with Apache Spark
Clickstream Analysis with Apache SparkClickstream Analysis with Apache Spark
Clickstream Analysis with Apache Spark
 
OSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearchOSMC 2021 | Introduction into OpenSearch
OSMC 2021 | Introduction into OpenSearch
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Intro to the Alfresco Public API
Intro to the Alfresco Public APIIntro to the Alfresco Public API
Intro to the Alfresco Public API
 
A Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon CloudA Step By Step Guide To Put DB2 On Amazon Cloud
A Step By Step Guide To Put DB2 On Amazon Cloud
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
Amazon EBS: Deep Dive
Amazon EBS: Deep DiveAmazon EBS: Deep Dive
Amazon EBS: Deep Dive
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 

Similar to Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Online Tech Talks

Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
Amazon Web Services
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Amazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
Amazon Web Services
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
Jim Dowling
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
Amazon Web Services
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
Sridevi Murugayen
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Amazon Web Services
 
Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Kinesis Firehose - Pop-up Loft TLV 2017Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Web Services
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
javier ramirez
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 

Similar to Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Online Tech Talks (20)

Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Kinesis Firehose - Pop-up Loft TLV 2017Amazon Kinesis Firehose - Pop-up Loft TLV 2017
Amazon Kinesis Firehose - Pop-up Loft TLV 2017
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Online Tech Talks

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ryan Nienhuis, Sr. Product Manager, Amazon Kinesis 4/19/2017 Streaming ETL for Data Lakes using Amazon Kinesis Firehose
  • 2. Agenda • Data Lake Overview • Kinesis Firehose Overview • Demo and Walkthrough • Q & A
  • 4. Data Lake Capabilities • Collecting and storing any type of data, at any scale and at low costs • Securing and protecting all of data stored in the central repository • Searching and finding the relevant data in the central repository • Quickly and easily performing new types of data analysis on datasets • Querying the data by defining the data’s structure at the time of use (schema on read)
  • 5. Data Lake Capabilities • Collecting and storing any type of data, at any scale and at low costs • Securing and protecting all of data stored in the central repository • Searching and finding the relevant data in the central repository • Quickly and easily performing new types of data analysis on datasets • Querying the data by defining the data’s structure at the time of use (schema on read)
  • 6. Data Lake Storage Amazon S3 highly scalable and durable object storage for any type of data, at any scale and at low costs
  • 7. Data Lake Ingestion Kinesis Firehose Real-time streaming data ETL for any type of data, at any scale and at low costs
  • 8. Time Value of Money Data
  • 11. Firehose delivery stream destination S3 bucket backup S3 bucket source records data source source records transformed records transformation failure Streaming ETL to S3
  • 12. Streaming ETL to Redshift intermediate S3 bucket backup S3 bucket source records data source source records Redshift cluster Firehose delivery stream transformed records transformed records transformation failure delivery failure
  • 13. Streaming ETL to Elasticsearch Elasticsearch cluster backup S3 bucket source records data source source records Firehose delivery stream transformed records delivery failure transformation failure
  • 15. Step 1 Set Up Firehose Delivery Stream and Configure Data Transformation
  • 20. Step 2 Send Data to Firehose Delivery Stream
  • 21. Sample Data 219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.1; .NET CLR 3.8.23015.5)" 95.169.41.62 - - [16/Feb/2017:09:38:20 -0800] "PUT /app/main/posts HTTP/1.1" 200 3883 "-" "Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko" 221.147.191.247 - - [16/Feb/2017:09:38:20 -0800] "GET /explore HTTP/1.1" 200 6579 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1) AppleWebKit/538.0.1 (KHTML, like Gecko) Chrome/38.0.895.0 Safari/538.0.1" 179.96.123.130 - - [16/Feb/2017:09:38:20 -0800] "GET /list HTTP/1.1" 200 560 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:5.4) Gecko/20100101 Firefox/5.4.6" 132.119.12.76 - - [16/Feb/2017:09:38:20 -0800] "PUT /explore HTTP/1.1" 200 3131 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_0 rv:5.0; AZ) AppleWebKit/535.1.0 (KHTML, like Gecko) Version/4.0.3 Safari/535.1.0" 74.113.56.92 - - [16/Feb/2017:09:38:20 -0800] "DELETE /app/main/posts HTTP/1.1" 200 7069 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_9) AppleWebKit/532.1.0 (KHTML, like Gecko) Chrome/15.0.877.0 Safari/532.1.0"
  • 22. After Data Transformation {"host":"26.56.11.130","ident":"-","authuser":"-","request":"GET /wp-content HTTP/1.1","response":200,"bytes":4582,"verb":"GET","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"} {"host":"180.153.215.216","ident":"-","authuser":"-","request":"PUT /search/tag/list HTTP/1.1","response":200,"bytes":1461,"verb":"PUT","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"} {"host":"155.233.163.37","ident":"-","authuser":"-","request":"GET /explore HTTP/1.1","response":500,"bytes":326,"verb":"GET","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"} {"host":"189.176.106.5","ident":"-","authuser":"-","request":"POST /search/tag/list HTTP/1.1","response":200,"bytes":3059,"verb":"POST","@timestamp":"2017-04- 04T11:32:29.000Z","timezone":"-0700","@timestamp_utc":"2017-04-04T18:32:29.000Z"}
  • 24. Step 3 Check Results in S3
  • 25.
  • 26. Step 4 Monitor Streaming Data Pipeline
  • 31. Q & A

Editor's Notes

  1. csv timeformat ‘auto’
  2. 'use strict'; console.log('Loading function'); /* Combined Apache Log format parser */ const parser = /^([\d.]+) (\S+) (\S+) \[([\w:\/]+\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\d+) \"([^\"]+)\" \"([^\"]+)\"/; exports.handler = (event, context, callback) => { let success = 0; // Number of valid entries found let failure = 0; // Number of invalid entries found /* Process the list of records and transform them */ const output = event.records.map((record) => { const entry = (new Buffer(record.data, 'base64')).toString('utf8'); const match = parser.exec(entry); if (match) { /* Prepare CSV version from Apache log data */ const requestParts = match[5].split(' '); const result = `${match[1]},${match[4]},${requestParts[0]},${requestParts[1]},${requestParts[2]},${match[6]},${match[7]},${match[8]},"${match[9]}"\n`; const payload = (new Buffer(result, 'utf8')).toString('base64'); success++; return { recordId: record.recordId, result: 'Ok', data: payload, }; } else { /* Failed event, notify the error and leave the record intact */ failure++; return { recordId: record.recordId, result: 'ProcessingFailed', data: record.data, }; } }); console.log(`Processing completed. Successful records ${success}, Failed records ${failure}.`); callback(null, { records: output }); };