SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rahul Bhartia, Partner Solution Architect, AWS Partner Program
July 13, 2016
Building Your First Big Data Application on AWS
Your first big data application on AWS
PROCESS
STORE
ANALYZE & VISUALIZE
COLLECT
Your first big data application on AWS
PROCESS:
Amazon EMR with Spark & Hive
STORE
ANALYZE & VISUALIZE:
Amazon Redshift and Amazon QuickSight
COLLECT:
Amazon Kinesis Firehose
http://aws.amazon.com/big-data/use-cases/
A modern take on the classic data warehouse
Setting up the environment
Data storage with Amazon S3
Download all the CLI steps: http://bit.ly/aws-big-data-steps
Create an Amazon S3 bucket to store the data collected
with Amazon Kinesis Firehose.
aws s3 mb s3://YOUR-S3-BUCKET-NAME
Access control with IAM
Create an IAM role to allow Firehose to write to the S3 bucket.
firehose-policy.json:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": {"Service": "firehose.amazonaws.com"},
"Action": "sts:AssumeRole"
}
}
Access control with IAM
Create an IAM role to allow Firehose to write to the S3 bucket.
s3-rw-policy.json:
{ "Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::YOUR-S3-BUCKET-NAME",
"arn:aws:s3:::YOUR-S3-BUCKET-NAME/*"
]
} }
Access control with IAM
Create an IAM role to allow Firehose to write to the S3 bucket.
aws iam create-role --role-name firehose-demo 
--assume-role-policy-document file://firehose-policy.json
Copy the value in “Arn” in the output
(e.g., arn:aws:iam::123456789:role/firehose-demo).
aws iam put-role-policy --role-name firehose-demo 
--policy-name firehose-s3-rw 
--policy-document file://s3-rw-policy.json
Data collection with Amazon Kinesis Firehose
Create a Firehose stream for incoming log data.
aws firehose create-delivery-stream 
--delivery-stream-name demo-firehose-stream 
--s3-destination-configuration 
RoleARN=YOUR-FIREHOSE-ARN,
BucketARN="arn:aws:s3:::YOUR-S3-BUCKET-NAME",
Prefix=firehose/,
BufferingHints={IntervalInSeconds=60},
CompressionFormat=GZIP
Data processing with Amazon EMR
Launch an Amazon EMR cluster with Spark and Hive.
aws emr create-cluster 
--name "demo" 
--release-label emr-4.5.0 
--instance-type m3.xlarge 
--instance-count 2 
--ec2-attributes KeyName=YOUR-AWS-SSH-KEY 
--use-default-roles 
--applications Name=Hive Name=Spark Name=Zeppelin-Sandbox
Record your ClusterId from the output.
Data analysis with Amazon Redshift
Create a single-node Amazon Redshift data warehouse.
aws redshift create-cluster 
--cluster-identifier demo 
--db-name demo 
--node-type dc1.large 
--cluster-type single-node 
--master-username master 
--master-user-password YOUR-REDSHIFT-PASSWORD 
--publicly-accessible 
--port 8192
Collect
Weblogs – Common Log Format (CLF)
75.35.230.210 - - [20/Jul/2009:22:22:42 -0700]
"GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236
"http://www.swivel.com/graphs/show/1163466"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11)
Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Writing into Amazon Kinesis Firehose
Download the demo weblog: http://bit.ly/aws-big-data
Open Python and run the following code to import the log
into the stream.
import boto3
iam = boto3.client('iam')
firehose = boto3.client('firehose')
with open('weblog', 'r') as f:
for line in f:
firehose.put_record(
DeliveryStreamName='demo-firehose-stream',
Record={'Data': line}
)
print 'Record added'
Process
Apache Spark
• Fast, general-purpose engine
for large-scale data processing
• Write applications quickly in
Java, Scala, or Python
• Combine SQL, streaming, and
complex analytics
Spark SQL
Spark's module for working with structured data using SQL
Run unmodified Hive queries on existing data
Apache Zeppelin
• Web-based notebook for interactive analytics
• Multiple language back end
• Apache Spark integration
• Data visualization
• Collaboration
https://zeppelin.incubator.apache.org/
View the output files in Amazon S3
After about 1 minute, you should see files in your S3
bucket.
aws s3 ls s3://YOUR-S3-BUCKET-NAME/firehose/ --recursive
Connect to your EMR cluster and Zeppelin
aws emr describe-cluster --cluster-id YOUR-EMR-CLUSTER-ID
Copy the MasterPublicDnsName. Use port forwarding so you can
access Zeppelin at http://localhost:18890 on your local machine.
ssh -i PATH-TO-YOUR-SSH-KEY -L 18890:localhost:8890 
hadoop@YOUR-EMR-DNS-NAME
Open Zeppelin with your local web browser and create a new “Note”.
http://localhost:18890
Exploring the data in Amazon S3 using Spark
Download the Zeppelin notebook: http://bit.ly/aws-big-data-zeppelin
// Load all the files from S3 into a RDD
val accessLogLines = sc.textFile("s3://YOUR-S3-BUCKET-NAME/firehose/*/*/*/*")
// Count the lines
accessLogLines.count
// Print one line as a string
accessLogLines.first
// Delimited by space so split them into fields
var accessLogFields = accessLogLines.map(_.split(" ").map(_.trim))
// Print the fields of a line
accessLogFields.first
Combine fields: “A, B, C”  “A B C”
var accessLogColumns = accessLogFields
.map( arrayOfFields => { var temp1 =""; for (field <- arrayOfFields) yield {
var temp2 = ""
if (temp1.replaceAll("[",""").startsWith(""") && !temp1.endsWith("""))
temp1 = temp1 + " " + field.replaceAll("[|]",""")
else temp1 = field.replaceAll("[|]",""")
temp2 = temp1
if (temp1.endsWith(""")) temp1 = ""
temp2
}})
.map( fields => fields.filter(field => (field.startsWith(""") &&
field.endsWith(""")) || !field.startsWith(""") ))
.map(fields => fields.map(_.replaceAll(""","")))
Create a data frame and transform the data
import java.sql.Timestamp
import java.net.URL
case class accessLogs(
ipAddress: String,
requestTime: Timestamp,
requestMethod: String,
requestPath: String,
requestProtocol: String,
responseCode: String,
responseSize: String,
referrerHost: String,
userAgent: String
)
Create a data frame and transform the data
val accessLogsDF = accessLogColumns.map(line => {
var ipAddress = line(0)
var requestTime = new Timestamp(new
java.text.SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z").parse(line(3)).getTime())
var requestString = line(4).split(" ").map(_.trim())
var requestMethod = if (line(4).toString() != "-") requestString(0) else ""
var requestPath = if (line(4).toString() != "-") requestString(1) else ""
var requestProtocol = if (line(4).toString() != "-") requestString(2) else ""
var responseCode = line(5).replaceAll("-","")
var responseSize = line(6).replaceAll("-","")
var referrerHost = line(7)
var userAgent = line(8)
accessLogs(ipAddress, requestTime, requestMethod, requestPath,
requestProtocol,responseCode, responseSize, referrerHost, userAgent)
}).toDF()
Create an external table backed by Amazon S3
%sql
CREATE EXTERNAL TABLE access_logs
(
ip_address String,
request_time Timestamp,
request_method String,
request_path String,
request_protocol String,
response_code String,
response_size String,
referrer_host String,
user_agent String
)
PARTITIONED BY (year STRING,month STRING, day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
STORED AS TEXTFILE
LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed'
Configure Hive partitioning and compression
// Set up Hive's "dynamic partitioning”
%sql
SET hive.exec.dynamic.partition=true
// Compress output files on Amazon S3 using Gzip
%sql
SET hive.exec.compress.output=true
%sql
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
%sql
SET io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec
%sql
SET hive.exec.dynamic.partition.mode=nonstrict;
Write output to Amazon S3
import org.apache.spark.sql.SaveMode
accessLogsDF
.withColumn("year", year(accessLogsDF("requestTime")))
.withColumn("month", month(accessLogsDF("requestTime")))
.withColumn("day", dayofmonth(accessLogsDF("requestTime")))
.write
.partitionBy("year","month","day")
.mode(SaveMode.Overwrite)
.insertInto("access_logs")
Query the data using Spark SQL
// Check the count of records
%sql
select count(*) from access_log_processed
// Fetch the first 10 records
%sql
select * from access_log_processed limit 10
View the output files in Amazon S3
Leave Zeppelin and go back to the console…
List the partition prefixes and output files.
aws s3 ls s3://YOUR-S3-BUCKET-NAME/access-log-processed/ 
--recursive
Analyze
Connect to Amazon Redshift
Using the PostgreSQL CLI.
psql -h YOUR-REDSHIFT-ENDPOINT 
-p 8192 -U master demo
Or use any JDBC or ODBC SQL client with the PostgreSQL 8.x drivers
or native Amazon Redshift support:
• Aginity Workbench for Amazon Redshift
• SQL Workbench/J
Create an Amazon Redshift table to hold your data
CREATE TABLE accesslogs
(
host_address varchar(512),
request_time timestamp,
request_method varchar(5),
request_path varchar(1024),
request_protocol varchar(10),
response_code Int,
response_size Int,
referrer_host varchar(1024),
user_agent varchar(512)
)
DISTKEY(host_address)
SORTKEY(request_time);
Loading data into Amazon Redshift
“COPY” command loads files in parallel from Amazon S3.
COPY accesslogs
FROM 's3://YOUR-S3-BUCKET-NAME/access-log-processed'
CREDENTIALS
'aws_access_key_id=YOUR-IAM-
ACCESS_KEY;aws_secret_access_key=YOUR-IAM-SECRET-KEY'
DELIMITER 't'
MAXERROR 0
GZIP;
Amazon Redshift test queries
-- Find distribution of response codes over days
SELECT TRUNC(request_time), response_code, COUNT(1) FROM
accesslogs GROUP BY 1,2 ORDER BY 1,3 DESC;
-- Find the 404 status codes
SELECT COUNT(1) FROM accessLogs WHERE response_code = 404;
-- Show all requests for status as PAGE NOT FOUND
SELECT TOP 1 request_path,COUNT(1) FROM accesslogs WHERE
response_code = 404 GROUP BY 1 ORDER BY 2 DESC;
Visualize the results
DEMO
Amazon QuickSight
Automating the big data application
Amazon
Kinesis
Firehose
Amazon
EMR
Amazon
S3
Amazon
Redshift
Amazon
QuickSight
Amazon
S3
Event
notification
Spark job
List of objects from AWS Lambda
Write to Amazon Redshift using Spark-Redshift
Learn from big data experts
blogs.aws.amazon.com/bigdata
Thank you!

More Related Content

What's hot

Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Web Services
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Amazon Web Services
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
Amazon Web Services
 
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksDeep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute ServicesGetting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute Services
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Amazon Web Services
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
Amazon Web Services
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - Toronto
Amazon Web Services
 
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Amazon Web Services
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
Amazon Web Services
 
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraNEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
Amazon Web Services
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
Amazon Web Services
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Amazon Web Services
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
Amazon Web Services
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Amazon Web Services
 
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Amazon Web Services
 
AWS RDS Migration Tool
AWS RDS Migration Tool AWS RDS Migration Tool

What's hot (20)

Amazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database EngineAmazon Aurora: Amazon’s New Relational Database Engine
Amazon Aurora: Amazon’s New Relational Database Engine
 
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
Data Replication Options in AWS (ARC302) | AWS re:Invent 2013
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksDeep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Getting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute ServicesGetting Started with Amazon EC2 and AWS Compute Services
Getting Started with Amazon EC2 and AWS Compute Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Getting started with amazon aurora - Toronto
Getting started with amazon aurora - TorontoGetting started with amazon aurora - Toronto
Getting started with amazon aurora - Toronto
 
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform MigrationDatabase Migration – Simple, Cross-Engine and Cross-Platform Migration
Database Migration – Simple, Cross-Engine and Cross-Platform Migration
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon AuroraNEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
NEW LAUNCH! Introducing PostgreSQL compatibility for Amazon Aurora
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
 
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseStreaming Data Analytics with Amazon Redshift and Kinesis Firehose
Streaming Data Analytics with Amazon Redshift and Kinesis Firehose
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
Optimize MySQL Workloads with Amazon Elastic Block Store - February 2017 AWS ...
 
AWS RDS Migration Tool
AWS RDS Migration Tool AWS RDS Migration Tool
AWS RDS Migration Tool
 

Viewers also liked

My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
Amazon Web Services
 
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
Amazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Amazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Amazon Web Services
 
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Amazon Web Services
 
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Amazon Web Services
 
Hack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 ThreatsHack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 Threats
Amazon Web Services
 
Sony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS CloudSony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS Cloud
Amazon Web Services
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
Amazon Web Services
 
Grow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS CloudGrow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS Cloud
Amazon Web Services
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
Amazon Web Services
 
AWS Summit Auckland- Developing Applications for IoT
AWS Summit Auckland-  Developing Applications for IoTAWS Summit Auckland-  Developing Applications for IoT
AWS Summit Auckland- Developing Applications for IoT
Amazon Web Services
 
Expanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud InfrastructureExpanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud Infrastructure
Amazon Web Services
 
S'étendre à l'international
S'étendre à l'internationalS'étendre à l'international
S'étendre à l'international
Amazon Web Services
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion Packets
Amazon Web Services
 
Next-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC IntegrationNext-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC Integration
Amazon Web Services
 
AWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - VocusAWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - Vocus
Amazon Web Services
 
Cloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for PartnersCloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for Partners
Amazon Web Services
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
Amazon Web Services
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS Cloud
Amazon Web Services
 

Viewers also liked (20)

My First Big Data Application
My First Big Data ApplicationMy First Big Data Application
My First Big Data Application
 
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
Creating Your Virtual Data Center: VPC Fundamentals and Connectivity Options
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
Deep Dive: Developing, Deploying & Operating Mobile Apps with AWS
 
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
Session Sponsored by Trend Micro: 3 Secrets to Becoming a Cloud Security Supe...
 
Hack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 ThreatsHack-Proof Your Cloud: Responding to 2016 Threats
Hack-Proof Your Cloud: Responding to 2016 Threats
 
Sony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS CloudSony DAD NMS & Our Migration to the AWS Cloud
Sony DAD NMS & Our Migration to the AWS Cloud
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
Grow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS CloudGrow Your SMB Infrastructure on the AWS Cloud
Grow Your SMB Infrastructure on the AWS Cloud
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
AWS Summit Auckland- Developing Applications for IoT
AWS Summit Auckland-  Developing Applications for IoTAWS Summit Auckland-  Developing Applications for IoT
AWS Summit Auckland- Developing Applications for IoT
 
Expanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud InfrastructureExpanding Your Data Center with Hybrid Cloud Infrastructure
Expanding Your Data Center with Hybrid Cloud Infrastructure
 
S'étendre à l'international
S'étendre à l'internationalS'étendre à l'international
S'étendre à l'international
 
Another Day, Another Billion Packets
Another Day, Another Billion PacketsAnother Day, Another Billion Packets
Another Day, Another Billion Packets
 
Next-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC IntegrationNext-Generation Firewall Services VPC Integration
Next-Generation Firewall Services VPC Integration
 
AWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - VocusAWS Summit Auckland Sponsor Presentation - Vocus
AWS Summit Auckland Sponsor Presentation - Vocus
 
Cloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for PartnersCloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for Partners
 
Introduction to AWS X-Ray
Introduction to AWS X-RayIntroduction to AWS X-Ray
Introduction to AWS X-Ray
 
AWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS CloudAWS Webcast - Explore the AWS Cloud
AWS Webcast - Explore the AWS Cloud
 

Similar to Building Your First Big Data Application on AWS

Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
Amazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
Amazon Web Services
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
Amazon Web Services
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS
Amazon Web Services
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services
Simone Brunozzi
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud
Amazon Web Services
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
Amazon Web Services
 
AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings
Adam Book
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS
Amazon Web Services
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
Ian Massingham
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
Amazon Web Services
 
Taking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) FamilyTaking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) Family
Ben Hall
 
Amazed by AWS Series #4
Amazed by AWS Series #4Amazed by AWS Series #4
Amazed by AWS Series #4
Amazon Web Services Korea
 
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps_Fest
 
AWS Serverless Workshop
AWS Serverless WorkshopAWS Serverless Workshop
AWS Serverless Workshop
Mikael Puittinen
 
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
Amazon Web Services
 
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Amazon Web Services
 
OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL Suraj Bang
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiectures
Iegor Fadieiev
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
Amazon Web Services Korea
 

Similar to Building Your First Big Data Application on AWS (20)

Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
 
AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Taking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) FamilyTaking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) Family
 
Amazed by AWS Series #4
Amazed by AWS Series #4Amazed by AWS Series #4
Amazed by AWS Series #4
 
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
 
AWS Serverless Workshop
AWS Serverless WorkshopAWS Serverless Workshop
AWS Serverless Workshop
 
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
 
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
 
OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL OWB11gR2 - Extending ETL
OWB11gR2 - Extending ETL
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiectures
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Building Your First Big Data Application on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rahul Bhartia, Partner Solution Architect, AWS Partner Program July 13, 2016 Building Your First Big Data Application on AWS
  • 2. Your first big data application on AWS PROCESS STORE ANALYZE & VISUALIZE COLLECT
  • 3. Your first big data application on AWS PROCESS: Amazon EMR with Spark & Hive STORE ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight COLLECT: Amazon Kinesis Firehose
  • 5. Setting up the environment
  • 6. Data storage with Amazon S3 Download all the CLI steps: http://bit.ly/aws-big-data-steps Create an Amazon S3 bucket to store the data collected with Amazon Kinesis Firehose. aws s3 mb s3://YOUR-S3-BUCKET-NAME
  • 7. Access control with IAM Create an IAM role to allow Firehose to write to the S3 bucket. firehose-policy.json: { "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Principal": {"Service": "firehose.amazonaws.com"}, "Action": "sts:AssumeRole" } }
  • 8. Access control with IAM Create an IAM role to allow Firehose to write to the S3 bucket. s3-rw-policy.json: { "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::YOUR-S3-BUCKET-NAME", "arn:aws:s3:::YOUR-S3-BUCKET-NAME/*" ] } }
  • 9. Access control with IAM Create an IAM role to allow Firehose to write to the S3 bucket. aws iam create-role --role-name firehose-demo --assume-role-policy-document file://firehose-policy.json Copy the value in “Arn” in the output (e.g., arn:aws:iam::123456789:role/firehose-demo). aws iam put-role-policy --role-name firehose-demo --policy-name firehose-s3-rw --policy-document file://s3-rw-policy.json
  • 10. Data collection with Amazon Kinesis Firehose Create a Firehose stream for incoming log data. aws firehose create-delivery-stream --delivery-stream-name demo-firehose-stream --s3-destination-configuration RoleARN=YOUR-FIREHOSE-ARN, BucketARN="arn:aws:s3:::YOUR-S3-BUCKET-NAME", Prefix=firehose/, BufferingHints={IntervalInSeconds=60}, CompressionFormat=GZIP
  • 11. Data processing with Amazon EMR Launch an Amazon EMR cluster with Spark and Hive. aws emr create-cluster --name "demo" --release-label emr-4.5.0 --instance-type m3.xlarge --instance-count 2 --ec2-attributes KeyName=YOUR-AWS-SSH-KEY --use-default-roles --applications Name=Hive Name=Spark Name=Zeppelin-Sandbox Record your ClusterId from the output.
  • 12. Data analysis with Amazon Redshift Create a single-node Amazon Redshift data warehouse. aws redshift create-cluster --cluster-identifier demo --db-name demo --node-type dc1.large --cluster-type single-node --master-username master --master-user-password YOUR-REDSHIFT-PASSWORD --publicly-accessible --port 8192
  • 14. Weblogs – Common Log Format (CLF) 75.35.230.210 - - [20/Jul/2009:22:22:42 -0700] "GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236 "http://www.swivel.com/graphs/show/1163466" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
  • 15. Writing into Amazon Kinesis Firehose Download the demo weblog: http://bit.ly/aws-big-data Open Python and run the following code to import the log into the stream. import boto3 iam = boto3.client('iam') firehose = boto3.client('firehose') with open('weblog', 'r') as f: for line in f: firehose.put_record( DeliveryStreamName='demo-firehose-stream', Record={'Data': line} ) print 'Record added'
  • 17. Apache Spark • Fast, general-purpose engine for large-scale data processing • Write applications quickly in Java, Scala, or Python • Combine SQL, streaming, and complex analytics
  • 18. Spark SQL Spark's module for working with structured data using SQL Run unmodified Hive queries on existing data
  • 19. Apache Zeppelin • Web-based notebook for interactive analytics • Multiple language back end • Apache Spark integration • Data visualization • Collaboration https://zeppelin.incubator.apache.org/
  • 20. View the output files in Amazon S3 After about 1 minute, you should see files in your S3 bucket. aws s3 ls s3://YOUR-S3-BUCKET-NAME/firehose/ --recursive
  • 21. Connect to your EMR cluster and Zeppelin aws emr describe-cluster --cluster-id YOUR-EMR-CLUSTER-ID Copy the MasterPublicDnsName. Use port forwarding so you can access Zeppelin at http://localhost:18890 on your local machine. ssh -i PATH-TO-YOUR-SSH-KEY -L 18890:localhost:8890 hadoop@YOUR-EMR-DNS-NAME Open Zeppelin with your local web browser and create a new “Note”. http://localhost:18890
  • 22. Exploring the data in Amazon S3 using Spark Download the Zeppelin notebook: http://bit.ly/aws-big-data-zeppelin // Load all the files from S3 into a RDD val accessLogLines = sc.textFile("s3://YOUR-S3-BUCKET-NAME/firehose/*/*/*/*") // Count the lines accessLogLines.count // Print one line as a string accessLogLines.first // Delimited by space so split them into fields var accessLogFields = accessLogLines.map(_.split(" ").map(_.trim)) // Print the fields of a line accessLogFields.first
  • 23. Combine fields: “A, B, C”  “A B C” var accessLogColumns = accessLogFields .map( arrayOfFields => { var temp1 =""; for (field <- arrayOfFields) yield { var temp2 = "" if (temp1.replaceAll("[",""").startsWith(""") && !temp1.endsWith(""")) temp1 = temp1 + " " + field.replaceAll("[|]",""") else temp1 = field.replaceAll("[|]",""") temp2 = temp1 if (temp1.endsWith(""")) temp1 = "" temp2 }}) .map( fields => fields.filter(field => (field.startsWith(""") && field.endsWith(""")) || !field.startsWith(""") )) .map(fields => fields.map(_.replaceAll(""","")))
  • 24. Create a data frame and transform the data import java.sql.Timestamp import java.net.URL case class accessLogs( ipAddress: String, requestTime: Timestamp, requestMethod: String, requestPath: String, requestProtocol: String, responseCode: String, responseSize: String, referrerHost: String, userAgent: String )
  • 25. Create a data frame and transform the data val accessLogsDF = accessLogColumns.map(line => { var ipAddress = line(0) var requestTime = new Timestamp(new java.text.SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z").parse(line(3)).getTime()) var requestString = line(4).split(" ").map(_.trim()) var requestMethod = if (line(4).toString() != "-") requestString(0) else "" var requestPath = if (line(4).toString() != "-") requestString(1) else "" var requestProtocol = if (line(4).toString() != "-") requestString(2) else "" var responseCode = line(5).replaceAll("-","") var responseSize = line(6).replaceAll("-","") var referrerHost = line(7) var userAgent = line(8) accessLogs(ipAddress, requestTime, requestMethod, requestPath, requestProtocol,responseCode, responseSize, referrerHost, userAgent) }).toDF()
  • 26. Create an external table backed by Amazon S3 %sql CREATE EXTERNAL TABLE access_logs ( ip_address String, request_time Timestamp, request_method String, request_path String, request_protocol String, response_code String, response_size String, referrer_host String, user_agent String ) PARTITIONED BY (year STRING,month STRING, day STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed'
  • 27. Configure Hive partitioning and compression // Set up Hive's "dynamic partitioning” %sql SET hive.exec.dynamic.partition=true // Compress output files on Amazon S3 using Gzip %sql SET hive.exec.compress.output=true %sql SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec %sql SET io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec %sql SET hive.exec.dynamic.partition.mode=nonstrict;
  • 28. Write output to Amazon S3 import org.apache.spark.sql.SaveMode accessLogsDF .withColumn("year", year(accessLogsDF("requestTime"))) .withColumn("month", month(accessLogsDF("requestTime"))) .withColumn("day", dayofmonth(accessLogsDF("requestTime"))) .write .partitionBy("year","month","day") .mode(SaveMode.Overwrite) .insertInto("access_logs")
  • 29. Query the data using Spark SQL // Check the count of records %sql select count(*) from access_log_processed // Fetch the first 10 records %sql select * from access_log_processed limit 10
  • 30. View the output files in Amazon S3 Leave Zeppelin and go back to the console… List the partition prefixes and output files. aws s3 ls s3://YOUR-S3-BUCKET-NAME/access-log-processed/ --recursive
  • 32. Connect to Amazon Redshift Using the PostgreSQL CLI. psql -h YOUR-REDSHIFT-ENDPOINT -p 8192 -U master demo Or use any JDBC or ODBC SQL client with the PostgreSQL 8.x drivers or native Amazon Redshift support: • Aginity Workbench for Amazon Redshift • SQL Workbench/J
  • 33. Create an Amazon Redshift table to hold your data CREATE TABLE accesslogs ( host_address varchar(512), request_time timestamp, request_method varchar(5), request_path varchar(1024), request_protocol varchar(10), response_code Int, response_size Int, referrer_host varchar(1024), user_agent varchar(512) ) DISTKEY(host_address) SORTKEY(request_time);
  • 34. Loading data into Amazon Redshift “COPY” command loads files in parallel from Amazon S3. COPY accesslogs FROM 's3://YOUR-S3-BUCKET-NAME/access-log-processed' CREDENTIALS 'aws_access_key_id=YOUR-IAM- ACCESS_KEY;aws_secret_access_key=YOUR-IAM-SECRET-KEY' DELIMITER 't' MAXERROR 0 GZIP;
  • 35. Amazon Redshift test queries -- Find distribution of response codes over days SELECT TRUNC(request_time), response_code, COUNT(1) FROM accesslogs GROUP BY 1,2 ORDER BY 1,3 DESC; -- Find the 404 status codes SELECT COUNT(1) FROM accessLogs WHERE response_code = 404; -- Show all requests for status as PAGE NOT FOUND SELECT TOP 1 request_path,COUNT(1) FROM accesslogs WHERE response_code = 404 GROUP BY 1 ORDER BY 2 DESC;
  • 37. Automating the big data application Amazon Kinesis Firehose Amazon EMR Amazon S3 Amazon Redshift Amazon QuickSight Amazon S3 Event notification Spark job List of objects from AWS Lambda Write to Amazon Redshift using Spark-Redshift
  • 38. Learn from big data experts blogs.aws.amazon.com/bigdata