SlideShare a Scribd company logo
1 of 46
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Matt Yanchyshyn, Sr. Mgr. Solutions Architecture
October 2015
BDT205
Building Your First Big Data Application
Amazon S3
Amazon Kinesis
Amazon DynamoDB
Amazon RDS (Aurora)
AWS Lambda
KCL Apps
Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Collect Process Analyze
Store
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Data Answers
Your first big data application on AWS
Collect Process Analyze
Store
Data Answers
Collect Process Analyze
Store
Data Answers
Collect Process Analyze
Store
Data Answers
SQL
Set up with the AWS CLI
Amazon Kinesis
Create a single-shard Amazon Kinesis stream for incoming
log data:
aws kinesis create-stream 
--stream-name AccessLogStream 
--shard-count 1
Amazon S3
YOUR-BUCKET-NAME
Amazon EMR
Launch a 3-node Amazon EMR cluster with Spark and Hive:
m3.xlarge
YOUR-AWS-SSH-KEY
Amazon Redshift

CHOOSE-A-REDSHIFT-PASSWORD
Your first big data application on AWS
2. PROCESS: Process data with
Amazon EMR using Spark & Hive
STORE
3. ANALYZE: Analyze data in
Amazon Redshift using SQLSQL
1. COLLECT: Stream data into
Amazon Kinesis with Log4J
1. Collect
Amazon Kinesis Log4J Appender
In a separate terminal window on your local machine,
download Log4J Appender:
Then download and save the sample Apache log file:
Amazon Kinesis Log4J Appender
Create a file called AwsCredentials.properties with
credentials for an IAM user with permission to write to
Amazon Kinesis:
accessKey=YOUR-IAM-ACCESS-KEY
secretKey=YOUR-SECRET-KEY
Then start the Amazon Kinesis Log4J Appender:
Log file format
Spark
• Fast, general purpose engine
for large-scale data processing
• Write applications quickly in
Java, Scala, or Python
• Combine SQL, streaming, and
complex analytics
Amazon Kinesis and Spark Streaming
Log4J
Appender
Amazon
Kinesis
Amazon
S3
Amazon
DynamoDB
Spark-Streaming uses
Kinesis Client Library
Amazon
EMR
Using Spark Streaming on Amazon EMR
-o TCPKeepAlive=yes -o ServerAliveInterval=30 
YOUR-AWS-SSH-KEY YOUR-EMR-HOSTNAME
On your cluster, download the Amazon Kinesis client for
Spark:
Using Spark Streaming on Amazon EMR
Cut down on console noise:
Start the Spark shell:
spark-shell --jars /usr/lib/spark/extras/lib/spark-streaming-
kinesis-asl.jar,amazon-kinesis-client-1.6.0.jar --driver-java-
options "-
Dlog4j.configuration=file:///etc/spark/conf/log4j.properties"
Using Spark Streaming on Amazon EMR
/* import required libraries */
Using Spark Streaming on Amazon EMR
/* Set up the variables as needed */
YOUR-REGION
YOUR-S3-BUCKET
/* Reconfigure the spark-shell */
Reading Amazon Kinesis with Spark Streaming
/* Setup the KinesisClient */
val kinesisClient = new AmazonKinesisClient(new
DefaultAWSCredentialsProviderChain())
kinesisClient.setEndpoint(endpointUrl)
/* Determine the number of shards from the stream */
val numShards =
kinesisClient.describeStream(streamName).getStreamDescription().getShard
s().size()
/* Create one worker per Kinesis shard */
val ssc = new StreamingContext(sc, outputInterval)
val kinesisStreams = (0 until numShards).map { i =>
KinesisUtils.createStream(ssc, streamName,
endpointUrl,outputInterval,InitialPositionInStream.TRIM_HORIZON,
StorageLevel.MEMORY_ONLY)
}
Writing to Amazon S3 with Spark Streaming
/* Merge the worker Dstreams and translate the byteArray to string */
/* Write each RDD to Amazon S3 */
View the output files in Amazon S3
YOUR-S3-BUCKET
YOUR-S3-BUCKET
yyyy mm dd HH
2. Process
Spark SQL
Spark's module for working with structured data using SQL
Run unmodified Hive queries on existing data
Using Spark SQL on Amazon EMR
YOUR-AWS-SSH-KEY YOUR-EMR-HOSTNAME
Start the Spark SQL shell:
spark-sql --driver-java-options "-
Dlog4j.configuration=file:///etc/spark/conf/log4j.propertie
s"
Create a table that points to your Amazon S3 bucket
CREATE EXTERNAL TABLE access_log_raw(
host STRING, identity STRING,
user STRING, request_time STRING,
request STRING, status STRING,
size STRING, referrer STRING,
agent STRING
)
PARTITIONED BY (year INT, month INT, day INT, hour INT, min INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|[[^]]*]) ([^
"]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*") ([^
"]*|"[^"]*"))?"
)
LOCATION 's3://YOUR-S3-BUCKET/access-log-raw';
msck repair table access_log_raw;
Query the data with Spark SQL
-- return the first row in the stream
-- return count all items in the Stream
-- find the top 10 hosts
Preparing the data for Amazon Redshift import
We will transform the data that is returned by the query before writing it
to our Amazon S3-stored external Hive table
Hive user-defined functions (UDF) in use for the text transformations:
from_unixtime, unix_timestamp and hour
The “hour” value is important: this is what’s used to split and organize
the output files before writing to Amazon S3. These splits will allow us
to more efficiently load the data into Amazon Redshift later in the lab
using the parallel “COPY” command.
Create an external table in Amazon S3
YOUR-S3-BUCKET
Configure partition and compression
-- setup Hive's "dynamic partitioning"
-- this will split output files when writing to Amazon S3
-- compress output files on Amazon S3 using Gzip
Write output to Amazon S3
-- convert the Apache log timestamp to a UNIX timestamp
-- split files in Amazon S3 by the hour in the log lines
INSERT OVERWRITE TABLE access_log_processed PARTITION (hour)
SELECT
from_unixtime(unix_timestamp(request_time,
'[dd/MMM/yyyy:HH:mm:ss Z]')),
host,
request,
status,
referrer,
agent,
hour(from_unixtime(unix_timestamp(request_time,
'[dd/MMM/yyyy:HH:mm:ss Z]'))) as hour
FROM access_log_raw;
View the output files in Amazon S3
YOUR-S3-BUCKET
YOUR-S3-BUCKET
3. Analyze
Connect to Amazon Redshift
# using the PostgreSQL CLI
YOUR-REDSHIFT-ENDPOINT
Or use any JDBC or ODBC SQL client with the PostgreSQL
8.x drivers or native Amazon Redshift support
• Aginity Workbench for Amazon Redshift
• SQL Workbench/J
Create an Amazon Redshift table to hold your data
Loading data into Amazon Redshift
“COPY” command loads files in parallel
COPY accesslogs
FROM 's3://YOUR-S3-BUCKET/access-log-processed'
CREDENTIALS
'aws_access_key_id=YOUR-IAM-
ACCESS_KEY;aws_secret_access_key=YOUR-IAM-SECRET-KEY'
DELIMITER 't' IGNOREHEADER 0
MAXERROR 0
GZIP;
Amazon Redshift test queries
-- find distribution of status codes over days
-- find the 404 status codes
-- show all requests for status as PAGE NOT FOUND
Your first big data application on AWS
A favicon would fix 398 of the total 977 PAGE NOT FOUND (404) errors
Visualize the results
• Client-side JavaScript example using Plottable, a library built on D3
• Hosted on Amazon S3 for pennies a month
• AWS Lambda function used to query Amazon Redshift
…around the same cost as a cup of coffee
Try it yourself on the AWS cloud…
Service Est. Cost*
Amazon Kinesis $1.00
Amazon S3 (free tier) $0
Amazon EMR $0.44
Amazon Redshift $1.00
Est. Total $2.44
*Estimated costs assumes: use of free tier where available, lower cost instances, dataset no bigger than 10MB and instances running
for less than 4 hours. Costs may vary depending on options selected, size of dataset, and usage.
$3.50
Learn from AWS big data experts
blogs.aws.amazon.com/bigdata
Remember to complete
your evaluations!
Thank you!

More Related Content

What's hot

Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Information literacy and research process
Information literacy and research processInformation literacy and research process
Information literacy and research processpanthermediacenter
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?DATAVERSITY
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Security threats in cloud computing
Security threats  in cloud computingSecurity threats  in cloud computing
Security threats in cloud computingPuneet Arora
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdfDatacademy.ai
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
4.5 mining the worldwideweb
4.5 mining the worldwideweb4.5 mining the worldwideweb
4.5 mining the worldwidewebKrish_ver2
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 

What's hot (20)

Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Information literacy and research process
Information literacy and research processInformation literacy and research process
Information literacy and research process
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data mining
Data miningData mining
Data mining
 
Security threats in cloud computing
Security threats  in cloud computingSecurity threats  in cloud computing
Security threats in cloud computing
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
4.5 mining the worldwideweb
4.5 mining the worldwideweb4.5 mining the worldwideweb
4.5 mining the worldwideweb
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 

Similar to (BDT205) Your First Big Data Application On AWS

AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAmazon Web Services
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSAmazon Web Services
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS Amazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014Amazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAmazon Web Services
 
Cloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web ServicesCloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web ServicesAmazon Web Services
 
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Amazon Web Services
 
Serverless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best PracticesServerless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best PracticesDaniel Zivkovic
 
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDeployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDanilo Poccia
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016Amazon Web Services Korea
 
Picking the right AWS backend for your Java application (May 2017)
Picking the right AWS backend for your Java application (May 2017)Picking the right AWS backend for your Java application (May 2017)
Picking the right AWS backend for your Java application (May 2017)Julien SIMON
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
DevOps for the Enterprise: Virtual Office Hours
DevOps for the Enterprise: Virtual Office HoursDevOps for the Enterprise: Virtual Office Hours
DevOps for the Enterprise: Virtual Office HoursAmazon Web Services
 

Similar to (BDT205) Your First Big Data Application On AWS (20)

AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
 
Amazed by AWS Series #4
Amazed by AWS Series #4Amazed by AWS Series #4
Amazed by AWS Series #4
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWS
 
AWS Serverless Workshop
AWS Serverless WorkshopAWS Serverless Workshop
AWS Serverless Workshop
 
Cloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web ServicesCloud Computing in PHP With the Amazon Web Services
Cloud Computing in PHP With the Amazon Web Services
 
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
 
Serverless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best PracticesServerless Architectural Patterns & Best Practices
Serverless Architectural Patterns & Best Practices
 
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDeployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
 
Picking the right AWS backend for your Java application (May 2017)
Picking the right AWS backend for your Java application (May 2017)Picking the right AWS backend for your Java application (May 2017)
Picking the right AWS backend for your Java application (May 2017)
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
DevOps for the Enterprise: Virtual Office Hours
DevOps for the Enterprise: Virtual Office HoursDevOps for the Enterprise: Virtual Office Hours
DevOps for the Enterprise: Virtual Office Hours
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

(BDT205) Your First Big Data Application On AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Matt Yanchyshyn, Sr. Mgr. Solutions Architecture October 2015 BDT205 Building Your First Big Data Application
  • 2. Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS (Aurora) AWS Lambda KCL Apps Amazon EMR Amazon Redshift Amazon Machine Learning Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis Data Answers
  • 3. Your first big data application on AWS
  • 7. Set up with the AWS CLI
  • 8. Amazon Kinesis Create a single-shard Amazon Kinesis stream for incoming log data: aws kinesis create-stream --stream-name AccessLogStream --shard-count 1
  • 10. Amazon EMR Launch a 3-node Amazon EMR cluster with Spark and Hive: m3.xlarge YOUR-AWS-SSH-KEY
  • 12. Your first big data application on AWS 2. PROCESS: Process data with Amazon EMR using Spark & Hive STORE 3. ANALYZE: Analyze data in Amazon Redshift using SQLSQL 1. COLLECT: Stream data into Amazon Kinesis with Log4J
  • 14. Amazon Kinesis Log4J Appender In a separate terminal window on your local machine, download Log4J Appender: Then download and save the sample Apache log file:
  • 15. Amazon Kinesis Log4J Appender Create a file called AwsCredentials.properties with credentials for an IAM user with permission to write to Amazon Kinesis: accessKey=YOUR-IAM-ACCESS-KEY secretKey=YOUR-SECRET-KEY Then start the Amazon Kinesis Log4J Appender:
  • 17. Spark • Fast, general purpose engine for large-scale data processing • Write applications quickly in Java, Scala, or Python • Combine SQL, streaming, and complex analytics
  • 18. Amazon Kinesis and Spark Streaming Log4J Appender Amazon Kinesis Amazon S3 Amazon DynamoDB Spark-Streaming uses Kinesis Client Library Amazon EMR
  • 19. Using Spark Streaming on Amazon EMR -o TCPKeepAlive=yes -o ServerAliveInterval=30 YOUR-AWS-SSH-KEY YOUR-EMR-HOSTNAME On your cluster, download the Amazon Kinesis client for Spark:
  • 20. Using Spark Streaming on Amazon EMR Cut down on console noise: Start the Spark shell: spark-shell --jars /usr/lib/spark/extras/lib/spark-streaming- kinesis-asl.jar,amazon-kinesis-client-1.6.0.jar --driver-java- options "- Dlog4j.configuration=file:///etc/spark/conf/log4j.properties"
  • 21. Using Spark Streaming on Amazon EMR /* import required libraries */
  • 22. Using Spark Streaming on Amazon EMR /* Set up the variables as needed */ YOUR-REGION YOUR-S3-BUCKET /* Reconfigure the spark-shell */
  • 23. Reading Amazon Kinesis with Spark Streaming /* Setup the KinesisClient */ val kinesisClient = new AmazonKinesisClient(new DefaultAWSCredentialsProviderChain()) kinesisClient.setEndpoint(endpointUrl) /* Determine the number of shards from the stream */ val numShards = kinesisClient.describeStream(streamName).getStreamDescription().getShard s().size() /* Create one worker per Kinesis shard */ val ssc = new StreamingContext(sc, outputInterval) val kinesisStreams = (0 until numShards).map { i => KinesisUtils.createStream(ssc, streamName, endpointUrl,outputInterval,InitialPositionInStream.TRIM_HORIZON, StorageLevel.MEMORY_ONLY) }
  • 24. Writing to Amazon S3 with Spark Streaming /* Merge the worker Dstreams and translate the byteArray to string */ /* Write each RDD to Amazon S3 */
  • 25. View the output files in Amazon S3 YOUR-S3-BUCKET YOUR-S3-BUCKET yyyy mm dd HH
  • 27. Spark SQL Spark's module for working with structured data using SQL Run unmodified Hive queries on existing data
  • 28. Using Spark SQL on Amazon EMR YOUR-AWS-SSH-KEY YOUR-EMR-HOSTNAME Start the Spark SQL shell: spark-sql --driver-java-options "- Dlog4j.configuration=file:///etc/spark/conf/log4j.propertie s"
  • 29. Create a table that points to your Amazon S3 bucket CREATE EXTERNAL TABLE access_log_raw( host STRING, identity STRING, user STRING, request_time STRING, request STRING, status STRING, size STRING, referrer STRING, agent STRING ) PARTITIONED BY (year INT, month INT, day INT, hour INT, min INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|[[^]]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*") ([^ "]*|"[^"]*"))?" ) LOCATION 's3://YOUR-S3-BUCKET/access-log-raw'; msck repair table access_log_raw;
  • 30. Query the data with Spark SQL -- return the first row in the stream -- return count all items in the Stream -- find the top 10 hosts
  • 31. Preparing the data for Amazon Redshift import We will transform the data that is returned by the query before writing it to our Amazon S3-stored external Hive table Hive user-defined functions (UDF) in use for the text transformations: from_unixtime, unix_timestamp and hour The “hour” value is important: this is what’s used to split and organize the output files before writing to Amazon S3. These splits will allow us to more efficiently load the data into Amazon Redshift later in the lab using the parallel “COPY” command.
  • 32. Create an external table in Amazon S3 YOUR-S3-BUCKET
  • 33. Configure partition and compression -- setup Hive's "dynamic partitioning" -- this will split output files when writing to Amazon S3 -- compress output files on Amazon S3 using Gzip
  • 34. Write output to Amazon S3 -- convert the Apache log timestamp to a UNIX timestamp -- split files in Amazon S3 by the hour in the log lines INSERT OVERWRITE TABLE access_log_processed PARTITION (hour) SELECT from_unixtime(unix_timestamp(request_time, '[dd/MMM/yyyy:HH:mm:ss Z]')), host, request, status, referrer, agent, hour(from_unixtime(unix_timestamp(request_time, '[dd/MMM/yyyy:HH:mm:ss Z]'))) as hour FROM access_log_raw;
  • 35. View the output files in Amazon S3 YOUR-S3-BUCKET YOUR-S3-BUCKET
  • 37. Connect to Amazon Redshift # using the PostgreSQL CLI YOUR-REDSHIFT-ENDPOINT Or use any JDBC or ODBC SQL client with the PostgreSQL 8.x drivers or native Amazon Redshift support • Aginity Workbench for Amazon Redshift • SQL Workbench/J
  • 38. Create an Amazon Redshift table to hold your data
  • 39. Loading data into Amazon Redshift “COPY” command loads files in parallel COPY accesslogs FROM 's3://YOUR-S3-BUCKET/access-log-processed' CREDENTIALS 'aws_access_key_id=YOUR-IAM- ACCESS_KEY;aws_secret_access_key=YOUR-IAM-SECRET-KEY' DELIMITER 't' IGNOREHEADER 0 MAXERROR 0 GZIP;
  • 40. Amazon Redshift test queries -- find distribution of status codes over days -- find the 404 status codes -- show all requests for status as PAGE NOT FOUND
  • 41. Your first big data application on AWS A favicon would fix 398 of the total 977 PAGE NOT FOUND (404) errors
  • 42. Visualize the results • Client-side JavaScript example using Plottable, a library built on D3 • Hosted on Amazon S3 for pennies a month • AWS Lambda function used to query Amazon Redshift
  • 43. …around the same cost as a cup of coffee Try it yourself on the AWS cloud… Service Est. Cost* Amazon Kinesis $1.00 Amazon S3 (free tier) $0 Amazon EMR $0.44 Amazon Redshift $1.00 Est. Total $2.44 *Estimated costs assumes: use of free tier where available, lower cost instances, dataset no bigger than 10MB and instances running for less than 4 hours. Costs may vary depending on options selected, size of dataset, and usage. $3.50
  • 44. Learn from AWS big data experts blogs.aws.amazon.com/bigdata