SlideShare a Scribd company logo
1 of 31
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ray Zhu, Sr. Product Manager, Amazon Kinesis
2/23/2017
Streaming Data Analytics
with Amazon Kinesis
Firehose and Redshift
Agenda
• Kinesis Firehose and Redshift
• Build a Streaming Solution for Log Analytics
o Step 1 Set Up Redshift DB and Table
o Step 2 Create Firehose Delivery Stream and Configure Data
Transformation
o Step 3 Send Data to Firehose Delivery Stream
o Step 4 Query and Analyze the Data from Redshift
o Step 5 Monitor Streaming Data Pipeline
Load streaming data into Amazon S3,
Amazon Redshift, and Amazon
Elasticsearch Service
Kinesis Firehose
Petabyte-scale data warehouse
Amazon Redshift
Stream Data to Redshift
Data Flow Overview
Kinesis
Producer UI
Amazon
Kinesis
Firehose
Amazon
Redshift
Generate web
logs
Deliver processed web
logs to Redshift
Run SQL queries on
processed web logs
Transform raw data
to structured data
Step 1 Set Up Redshift DB and
Table
Cluster Details
Node Configuration
Additional Configuration
Review
Configure VPC Security Group
Configure VPC Security Group
US East (N. Virginia)
52.70.63.192/27
US West (Oregon)
52.89.255.224/27
EU (Ireland)
52.19.239.192/27
Connect to Redshift DB
Create a Redshift Table
create table weblogs(
host_address varchar(512),
request_time timestamp,
request_method varchar(6),
request_path varchar(1024),
request_protocol varchar(10),
response_code integer,
response_size integer,
referrer_host varchar(1024),
user_agent varchar(1024)
);
Create a Redshift Table
Step 2 Set Up Firehose
Delivery Stream and Configure
Data Transformation
Destination
Configuration
Review
Step 3 Send Data to Firehose
Delivery Stream
Sample Data
219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521
"-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.1; .NET CLR
3.8.23015.5)"
95.169.41.62 - - [16/Feb/2017:09:38:20 -0800] "PUT /app/main/posts HTTP/1.1" 200
3883 "-" "Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko"
221.147.191.247 - - [16/Feb/2017:09:38:20 -0800] "GET /explore HTTP/1.1" 200 6579 "-"
"Mozilla/5.0 (Windows; U; Windows NT 5.1) AppleWebKit/538.0.1 (KHTML, like Gecko)
Chrome/38.0.895.0 Safari/538.0.1"
179.96.123.130 - - [16/Feb/2017:09:38:20 -0800] "GET /list HTTP/1.1" 200 560 "-"
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:5.4) Gecko/20100101 Firefox/5.4.6"
132.119.12.76 - - [16/Feb/2017:09:38:20 -0800] "PUT /explore HTTP/1.1" 200 3131 "-"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_0 rv:5.0; AZ) AppleWebKit/535.1.0
(KHTML, like Gecko) Version/4.0.3 Safari/535.1.0"
74.113.56.92 - - [16/Feb/2017:09:38:20 -0800] "DELETE /app/main/posts HTTP/1.1" 200
7069 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_9) AppleWebKit/532.1.0
(KHTML, like Gecko) Chrome/15.0.877.0 Safari/532.1.0"
After Data Transformation
1.133.158.104,16/Feb/2017:10:26:46 -0800,GET,/search/tag/list,HTTP/1.1,200,9523,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.3) AppleWebKit/531.1.1 (KHTML, like Gecko)
Chrome/24.0.827.0 Safari/531.1.1"
194.189.242.208,16/Feb/2017:10:26:46 -0800,GET,/explore,HTTP/1.1,200,8202,-
,"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.0; Trident/5.1)"
210.104.234.68,16/Feb/2017:10:26:46 -0800,GET,/wp-content,HTTP/1.1,200,6523,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.0) AppleWebKit/538.0.2 (KHTML, like Gecko)
Chrome/19.0.804.0 Safari/538.0.2"
12.140.32.105,16/Feb/2017:10:26:46 -0800,PUT,/wp-admin,HTTP/1.1,200,9273,-
,"Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/6.0)"
208.53.124.37,16/Feb/2017:10:26:46 -0800,GET,/explore,HTTP/1.1,200,5187,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/531.2.1 (KHTML, like Gecko)
Chrome/36.0.842.0 Safari/531.2.1“
113.80.90.8,16/Feb/2017:10:26:46 -0800,PUT,/wp-content,HTTP/1.1,200,4431,-
,"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/534.1.1 (KHTML, like Gecko)
Chrome/23.0.886.0 Safari/534.1.1"
Send Data
Step 4 Query and Analyze the
Data from Redshift
Query Data
• Find distribution of response codes over days
SELECT TRUNC(request_time), response_code, COUNT(*) FROM
weblogs GROUP BY 1,2 ORDER BY 1,3 DESC;
• Count the number of 404 response codes
SELECT COUNT(*) FROM weblogs WHERE response_code = 404;
• Show all requests paths with status “PAGE NOT FOUND”
SELECT TOP 1 request_path, COUNT(*) FROM weblogs WHERE
response_code = 404 GROUP BY 1 ORDER BY 2 DESC;
Step 5 Monitor Streaming Data
Pipeline
Monitor with CloudWatch Metrics
Monitor with CloudWatch Logs
Q & A
Thank you!

More Related Content

What's hot

What's hot (20)

Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
 
Simple, Scalable and Highly Durable NAS in the Cloud – Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud – Amazon EFSSimple, Scalable and Highly Durable NAS in the Cloud – Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud – Amazon EFS
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraNEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
 
What's New in Amazon RDS for Open Source and Commercial Databases
What's New in Amazon RDS for Open Source and Commercial DatabasesWhat's New in Amazon RDS for Open Source and Commercial Databases
What's New in Amazon RDS for Open Source and Commercial Databases
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Cost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWSCost Savings at High Performance with Redis Labs and AWS
Cost Savings at High Performance with Redis Labs and AWS
 
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
AWS re:Invent 2016: Deep Dive on Amazon Relational Database Service (DAT305)
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Architecting on The Cloud
Architecting on The CloudArchitecting on The Cloud
Architecting on The Cloud
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
Getting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute ServicesGetting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute Services
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
SRV408 Deep Dive on AWS IoT
SRV408 Deep Dive on AWS IoTSRV408 Deep Dive on AWS IoT
SRV408 Deep Dive on AWS IoT
 
DevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
DevOps on AWS: Deep Dive on Infrastructure as Code - TorontoDevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
DevOps on AWS: Deep Dive on Infrastructure as Code - Toronto
 
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
Deep Dive RDS & Aurora - Pop-up Loft TLV 2017
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
Amazon EC2:Masterclass
Amazon EC2:MasterclassAmazon EC2:Masterclass
Amazon EC2:Masterclass
 

Viewers also liked

Viewers also liked (20)

Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
Deploy, scale and manage your application with AWS Elastic Beanstal
Deploy, scale and manage your application with AWS Elastic BeanstalDeploy, scale and manage your application with AWS Elastic Beanstal
Deploy, scale and manage your application with AWS Elastic Beanstal
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Getting Started with Docker on AWS
Getting Started with Docker on AWSGetting Started with Docker on AWS
Getting Started with Docker on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Best Practices with IoT Security - February Online Tech Talks
Best Practices with IoT Security - February Online Tech TalksBest Practices with IoT Security - February Online Tech Talks
Best Practices with IoT Security - February Online Tech Talks
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef Automate
 
Automated Governance of Your AWS Resources
Automated Governance of Your AWS ResourcesAutomated Governance of Your AWS Resources
Automated Governance of Your AWS Resources
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
Apex & Geode: In-memory streaming, storage & analytics
Apex & Geode: In-memory streaming, storage & analyticsApex & Geode: In-memory streaming, storage & analytics
Apex & Geode: In-memory streaming, storage & analytics
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless World
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
 

Similar to Streaming Data Analytics with Amazon Kinesis Firehose and Redshift

CTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should KnowCTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should Know
Spiffy
 
Webmaster's Report - IEEE Microwave Theory and Techniques Society
Webmaster's Report - IEEE Microwave Theory and Techniques SocietyWebmaster's Report - IEEE Microwave Theory and Techniques Society
Webmaster's Report - IEEE Microwave Theory and Techniques Society
webhostingguy
 
PHP Enhancement with Windows Server 2008
PHP Enhancement with Windows Server 2008PHP Enhancement with Windows Server 2008
PHP Enhancement with Windows Server 2008
Krit Kamtuo
 
MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2
Information Technology
 
SharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDrive
SharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDriveSharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDrive
SharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDrive
BIWUG
 

Similar to Streaming Data Analytics with Amazon Kinesis Firehose and Redshift (20)

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
 
Introduction to Real-time, Streaming Data and Amazon Kinesis
Introduction to Real-time, Streaming Data and Amazon KinesisIntroduction to Real-time, Streaming Data and Amazon Kinesis
Introduction to Real-time, Streaming Data and Amazon Kinesis
 
Spring 2007 SharePoint Connections Oleson Advanced Administration and Plannin...
Spring 2007 SharePoint Connections Oleson Advanced Administration and Plannin...Spring 2007 SharePoint Connections Oleson Advanced Administration and Plannin...
Spring 2007 SharePoint Connections Oleson Advanced Administration and Plannin...
 
CTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should KnowCTU June 2011 - Things that Every ASP.NET Developer Should Know
CTU June 2011 - Things that Every ASP.NET Developer Should Know
 
Preparing for Upgrade to SharePoint 2010 with Joel Oleson Quest Software Webcast
Preparing for Upgrade to SharePoint 2010 with Joel Oleson Quest Software WebcastPreparing for Upgrade to SharePoint 2010 with Joel Oleson Quest Software Webcast
Preparing for Upgrade to SharePoint 2010 with Joel Oleson Quest Software Webcast
 
DevOps, Open Source e Microsoft
DevOps, Open Source e MicrosoftDevOps, Open Source e Microsoft
DevOps, Open Source e Microsoft
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
 
Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh
 
Easing ASP.NET Web and SQL Server Database Deployment with VS 2010 and MsDeploy
Easing ASP.NET Web and  SQL Server Database Deployment withVS 2010 and MsDeployEasing ASP.NET Web and  SQL Server Database Deployment withVS 2010 and MsDeploy
Easing ASP.NET Web and SQL Server Database Deployment with VS 2010 and MsDeploy
 
Webmaster's Report - IEEE Microwave Theory and Techniques Society
Webmaster's Report - IEEE Microwave Theory and Techniques SocietyWebmaster's Report - IEEE Microwave Theory and Techniques Society
Webmaster's Report - IEEE Microwave Theory and Techniques Society
 
AW stats
AW statsAW stats
AW stats
 
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
 
OSDC 2015: Pere Urbon | Scaling Logstash: A Collection of War Stories
OSDC 2015: Pere Urbon | Scaling Logstash: A Collection of War StoriesOSDC 2015: Pere Urbon | Scaling Logstash: A Collection of War Stories
OSDC 2015: Pere Urbon | Scaling Logstash: A Collection of War Stories
 
PHP Enhancement with Windows Server 2008
PHP Enhancement with Windows Server 2008PHP Enhancement with Windows Server 2008
PHP Enhancement with Windows Server 2008
 
MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2MOSS 2007 Deployment Fundamentals -Part2
MOSS 2007 Deployment Fundamentals -Part2
 
SharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDrive
SharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDriveSharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDrive
SharePoint Saturday Belgium 2014 All about OneDrive for Business and OneDrive
 
Php Presentation
Php PresentationPhp Presentation
Php Presentation
 
gofortution
gofortutiongofortution
gofortution
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Recently uploaded (10)

Understanding Poverty: A Community Questionnaire
Understanding Poverty: A Community QuestionnaireUnderstanding Poverty: A Community Questionnaire
Understanding Poverty: A Community Questionnaire
 
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdfMicrosoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
Microsoft Fabric Analytics Engineer (DP-600) Exam Dumps 2024.pdf
 
ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024ServiceNow CIS-Discovery Exam Dumps 2024
ServiceNow CIS-Discovery Exam Dumps 2024
 
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptxDAY 0 8 A Revelation 05-19-2024 PPT.pptx
DAY 0 8 A Revelation 05-19-2024 PPT.pptx
 
OC Streetcar Final Presentation-Downtown Santa Ana
OC Streetcar Final Presentation-Downtown Santa AnaOC Streetcar Final Presentation-Downtown Santa Ana
OC Streetcar Final Presentation-Downtown Santa Ana
 
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docxThe Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
The Influence and Evolution of Mogul Press in Contemporary Public Relations.docx
 
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdfACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
ACM CHT Best Inspection Practices Kinben Innovation MIC Slideshare.pdf
 
Breathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptxBreathing in New Life_ Part 3 05 22 2024.pptx
Breathing in New Life_ Part 3 05 22 2024.pptx
 
Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.Deciding The Topic of our Magazine.pptx.
Deciding The Topic of our Magazine.pptx.
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 

Streaming Data Analytics with Amazon Kinesis Firehose and Redshift

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ray Zhu, Sr. Product Manager, Amazon Kinesis 2/23/2017 Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
  • 2. Agenda • Kinesis Firehose and Redshift • Build a Streaming Solution for Log Analytics o Step 1 Set Up Redshift DB and Table o Step 2 Create Firehose Delivery Stream and Configure Data Transformation o Step 3 Send Data to Firehose Delivery Stream o Step 4 Query and Analyze the Data from Redshift o Step 5 Monitor Streaming Data Pipeline
  • 3. Load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service Kinesis Firehose
  • 5. Stream Data to Redshift
  • 6. Data Flow Overview Kinesis Producer UI Amazon Kinesis Firehose Amazon Redshift Generate web logs Deliver processed web logs to Redshift Run SQL queries on processed web logs Transform raw data to structured data
  • 7. Step 1 Set Up Redshift DB and Table
  • 13. Configure VPC Security Group US East (N. Virginia) 52.70.63.192/27 US West (Oregon) 52.89.255.224/27 EU (Ireland) 52.19.239.192/27
  • 15. Create a Redshift Table create table weblogs( host_address varchar(512), request_time timestamp, request_method varchar(6), request_path varchar(1024), request_protocol varchar(10), response_code integer, response_size integer, referrer_host varchar(1024), user_agent varchar(1024) );
  • 17. Step 2 Set Up Firehose Delivery Stream and Configure Data Transformation
  • 21. Step 3 Send Data to Firehose Delivery Stream
  • 22. Sample Data 219.134.32.117 - - [16/Feb/2017:09:38:20 -0800] "GET /wp-content HTTP/1.1" 200 4521 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.1; .NET CLR 3.8.23015.5)" 95.169.41.62 - - [16/Feb/2017:09:38:20 -0800] "PUT /app/main/posts HTTP/1.1" 200 3883 "-" "Mozilla/5.0 (Windows NT 6.2; Trident/7.0; rv:11.0) like Gecko" 221.147.191.247 - - [16/Feb/2017:09:38:20 -0800] "GET /explore HTTP/1.1" 200 6579 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1) AppleWebKit/538.0.1 (KHTML, like Gecko) Chrome/38.0.895.0 Safari/538.0.1" 179.96.123.130 - - [16/Feb/2017:09:38:20 -0800] "GET /list HTTP/1.1" 200 560 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:5.4) Gecko/20100101 Firefox/5.4.6" 132.119.12.76 - - [16/Feb/2017:09:38:20 -0800] "PUT /explore HTTP/1.1" 200 3131 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_0 rv:5.0; AZ) AppleWebKit/535.1.0 (KHTML, like Gecko) Version/4.0.3 Safari/535.1.0" 74.113.56.92 - - [16/Feb/2017:09:38:20 -0800] "DELETE /app/main/posts HTTP/1.1" 200 7069 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_9) AppleWebKit/532.1.0 (KHTML, like Gecko) Chrome/15.0.877.0 Safari/532.1.0"
  • 23. After Data Transformation 1.133.158.104,16/Feb/2017:10:26:46 -0800,GET,/search/tag/list,HTTP/1.1,200,9523,- ,"Mozilla/5.0 (Windows; U; Windows NT 5.3) AppleWebKit/531.1.1 (KHTML, like Gecko) Chrome/24.0.827.0 Safari/531.1.1" 194.189.242.208,16/Feb/2017:10:26:46 -0800,GET,/explore,HTTP/1.1,200,8202,- ,"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.0; Trident/5.1)" 210.104.234.68,16/Feb/2017:10:26:46 -0800,GET,/wp-content,HTTP/1.1,200,6523,- ,"Mozilla/5.0 (Windows; U; Windows NT 5.0) AppleWebKit/538.0.2 (KHTML, like Gecko) Chrome/19.0.804.0 Safari/538.0.2" 12.140.32.105,16/Feb/2017:10:26:46 -0800,PUT,/wp-admin,HTTP/1.1,200,9273,- ,"Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 6.0; Trident/6.0)" 208.53.124.37,16/Feb/2017:10:26:46 -0800,GET,/explore,HTTP/1.1,200,5187,- ,"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/531.2.1 (KHTML, like Gecko) Chrome/36.0.842.0 Safari/531.2.1“ 113.80.90.8,16/Feb/2017:10:26:46 -0800,PUT,/wp-content,HTTP/1.1,200,4431,- ,"Mozilla/5.0 (Windows; U; Windows NT 5.2) AppleWebKit/534.1.1 (KHTML, like Gecko) Chrome/23.0.886.0 Safari/534.1.1"
  • 25. Step 4 Query and Analyze the Data from Redshift
  • 26. Query Data • Find distribution of response codes over days SELECT TRUNC(request_time), response_code, COUNT(*) FROM weblogs GROUP BY 1,2 ORDER BY 1,3 DESC; • Count the number of 404 response codes SELECT COUNT(*) FROM weblogs WHERE response_code = 404; • Show all requests paths with status “PAGE NOT FOUND” SELECT TOP 1 request_path, COUNT(*) FROM weblogs WHERE response_code = 404 GROUP BY 1 ORDER BY 2 DESC;
  • 27. Step 5 Monitor Streaming Data Pipeline
  • 30. Q & A

Editor's Notes

  1. 'use strict'; console.log('Loading function'); /* Combined Apache Log format parser */ const parser = /^([\d.]+) (\S+) (\S+) \[([\w:\/]+\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\d+) \"([^\"]+)\" \"([^\"]+)\"/; exports.handler = (event, context, callback) => { let success = 0; // Number of valid entries found let failure = 0; // Number of invalid entries found /* Process the list of records and transform them */ const output = event.records.map((record) => { const entry = (new Buffer(record.data, 'base64')).toString('utf8'); const match = parser.exec(entry); if (match) { /* Prepare CSV version from Apache log data */ const requestParts = match[5].split(' '); const result = `${match[1]},${match[4]},${requestParts[0]},${requestParts[1]},${requestParts[2]},${match[6]},${match[7]},${match[8]},"${match[9]}"\n`; const payload = (new Buffer(result, 'utf8')).toString('base64'); success++; return { recordId: record.recordId, result: 'Ok', data: payload, }; } else { /* Failed event, notify the error and leave the record intact */ failure++; return { recordId: record.recordId, result: 'ProcessingFailed', data: record.data, }; } }); console.log(`Processing completed. Successful records ${success}, Failed records ${failure}.`); callback(null, { records: output }); };
  2. {{internet.ip}} - - [{{date.now("DD/MMM/YYYY:HH:mm:ss ZZ")}}] "{{random.weightedArrayElement({"weights":[0.6,0.1,0.1,0.2],"data":["GET","POST","DELETE","PUT"]})}} {{random.arrayElement(["/list","/wp-content","/wp-admin","/explore","/search/tag/list","/app/main/posts","/posts/posts/explore"])}} HTTP/1.1" {{random.weightedArrayElement({"weights": [0.9,0.04,0.02,0.04], "data":["200","404","500","301"]})}} {{random.number(10000)}} "-" "{{internet.userAgent}}"