SlideShare a Scribd company logo
1 of 140
Download to read offline
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Allan MacInnis
Solutions Architect, Amazon Web Services
Ryan Nienhuis
Sr. PM, Amazon Kinesis, Amazon Web Services
BDA309
Building Your First Big Data Application
on AWS
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Workshop Objectives
 Build an end-to-end analytics application on AWS
 Implement batch and real-time layers in a single
application
 Explore AWS analytics portfolio for other use cases
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Why Analytics on AWS?
 Unmatched durability, and availability at
EB scale
 Best security, compliance, and audit
capabilities
 Object-level controls for fine-grain
access
 Fastest performance by retrieving
subsets of data
 The most ways to bring data in
 2x as many integrations with partners
 Analyze with broadest set of analytics &
ML services
Analytics
Data lake
on AWS
Real-time data
movement
On-premises
data movement
Machine
learning
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer UI Amazon Kinesis
Data Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to
Amazon S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
AWSGlue
Extract metadata, create
tables and transform
web logs from CSVto
parquet
Amazon S3
Bucket
Transformed web
logs from AWS Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
What Is qwikLABS?
 Provides access to AWS services for this workshop
 No need to provide a credit card
 Automatically deleted when you’re finished
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Sign in and Start the Lab
Once the lab is started, you will see a lab setup progress bar. It takes
~10 min for the lab to be setup.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Navigating qwikLABS
• Student resources:
Scripts for your labs
• Open console: Opens
AWSManagement
Console
• Additional connection
details: Links to different
interfaces
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Everything You Need for the Lab
 Open the AWS Management Console, log in, and verify that the following AWS
resources are created:
 One Amazon Kinesis Data Analytics application
 One Kinesis Data Analytics preprocessing Lambda function
 Two Amazon Kinesis Data Firehose delivery streams
 One Amazon EMR cluster
 One Amazon Redshift cluster
 Sign up (later) for:
 Amazon QuickSight
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis Data Streams
• Easy administration and low cost
• Build real-time applications with framework of choice
• Secure, durable storage
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis Data Firehose
 Zero administration and seamless elasticity
 Direct-to-data store integration
 Serverless, continuous data transformations
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis – Kinesis Data Firehose vs. Kinesis Data
Streams
Amazon Kinesis Data Streams is for use cases that require
custom processing, per incoming record, with sub-1 second
processing latency, and a choice of stream processing
frameworks.
Amazon Kinesis Data Firehose is for use cases that require
zero administration, ability to use existing analytics tools
based on Amazon S3, Amazon Redshift, and Amazon
Elasticsearch, and a data latency of 60 seconds orhigher.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1
Collect Logs Using a Kinesis Data
Firehose Delivery Stream
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer UI Amazon
Kinesis
Data
Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to
Amazon S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying
of web logs
AWSGlue
Extract metadata, create
tables and transform
web logs from CSVto
parquet
Amazon S3
Bucket
Transformed web logs
from AWS Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
 Time: 5 min.
We are going to:
 Write to a Kinesis Firehose delivery stream. We’ll simulate writing
transformed Apache web logs to a Firehose delivery stream that is configured
to deliver data into an S3 bucket.
 There are many different libraries that can be used to write data to a
Kinesis Date Firehose delivery stream. One popular option is called the
Amazon Kinesis Agent.
Collect Logs with a Kinesis Data Firehose Delivery
Stream
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
 Amazon Kinesis Agent
 Standalone Java application to collect and send data to Kinesis
Data Firehose
 Continuously monitors set of files
 Handles file rotation, check-pointing, and retry on failures
 Emits CloudWatch metrics
 Pre-process records parsed from monitored files
Collect Logs with a Kinesis Data Firehose Delivery
Stream
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Collect Logs with a Kinesis Data Firehose Delivery
Stream
For example, the agent can transform an Apache web log to JSON.
From:
To:
{"HOST" : "125.166.52.103",
"IDENT" : null,
"AUTHUSER" : null,
"DATETIME" : "08/Mar/2017:17:06:44 -08:00",
"REQUEST" : "GET /explore",
"RESPONSE" : 200,
"BYTES" : 2503,
"REFERRER" : null,
"AGENT" : "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2;
Trident/5.0)“}
125.166.52.103 - - [08/Mar/2017:17:06:44 -08:00] "GET /explore“
200 2503 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2;
Trident/5.0)“}
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
So that we don’t have to install or set up software on your machine, we are
going to use a utility called the Kinesis Data Generator to simulate using the
Amazon Kinesis agent. The Kinesis Data Generator can populate a Firehose
delivery stream using a template and is simple to set up.
Let’s get started!
Collect Logs with a Kinesis Data Firehose Delivery
Stream
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
Qwiklabs has already created and set up the Kinesis Data Firehose delivery
stream for us. All we have to do is start writing data to it.
Go to this Kinesis Data Generator (KDG) Help Section at
http://tinyurl.com/kinesispublisher
Which will redirect to:
https://s3.amazonaws.com/kinesis-data-producer-test/help.html
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
 Choose Create Amazon Cognito User with AWS CloudFormation.
 This link takes you to a service called AWS CloudFormation. AWS
CloudFormation gives developers and systems administrators an easy way
to create and manage a collection of related AWS resources, provisioning
and updating them in an orderly and predictable fashion.
 We use AWS CloudFormation to create the necessary user credentials to use
the Kinesis Data Generator.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
On the next screen, choose Next. Here, we are using a template stored in
an Amazon S3 bucket to create the necessary credentials.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
Specify a user name and password (and remember them!), and choose Next.
This user name and password will be used to sign into the Kinesis Data
Generator. The password must be at least 6 alpha-numeric characters, and
contain at least one number.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
 We use a service called Amazon Cognito to create these credentials.
 Amazon Cognito lets you easily add a user sign-up and sign-in to your mobile
and web apps. With Amazon Cognito, you also have the options to authenticate
users through social identity providers such as Facebook, Twitter, or Amazon,
with SAML identity solutions, or by using your own identity system.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
The next screen has some additional options for the AWS CloudFormation stack,
which are not needed. Choose Next.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
This screen is a review screen so you can verify that you have selected the
correct options. When you are ready, choose Acknowledge, and choose
Create.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
You are taken to a screen that shows the stack creation process. (You may have
to refresh the page.) In approximately one minute, the stack will complete, and,
under Status, you will see CREATE_COMPLETE. Once this occurs, a) select
the template, b) select the outputs tab, and c) choose the link to navigate to your
very own Kinesis Data Generator (hosted on Amazon S3).
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
 Log in using your user name and
password.
 Select the region us-west-2” (Oregon).
 Select the delivery stream with name (qls-<somerandomnumber>-
FirehoseDeliveryStream-<somerandomnumber>-11111111111).
 Specify a data rate (Records per second). Choose a number less than 10.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
We chose a number less than 10 because everyone is on the same Wi-Fi network, and we
want to be sure we don’t use all the bandwidth. Additionally, if you are not plugged in, you
may run into a battery issues at a higher rate.
For the Record Template, you should use the template Apache Combined Log Template,
in the Student Resources section. The template should look like the following:
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
Choose Send Data to Kinesis.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review: Monitoring Your Delivery Stream
 Go to the Amazon CloudWatch Metrics Console and search
IncomingRecords. Choose this metric for your Firehose delivery stream,
and for Sum, choose 1 Minute.
 What are the most important metrics to monitor?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2
Real-Time Data Processing
Using Kinesis Data Analytics
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer UI Amazon Kinesis
Data Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
AWSGlue
Extract metadata,create
tables and transform
web logs from CSVto
parquet
Amazon S3
Bucket
Transformed web
logs from AWSGlue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Kinesis Data Analytics
• Powerful real time applications
• Easy to use, fully managed
• Automatic elasticity
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Kinesis Data Analytics Applications
Easily write SQLcode to process streamingdata
Connect to streamingsource
Continuously deliver SQLresults
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Process Data Using Kinesis Data Analytics
 Time: 20 min.
We are going to:
 Write a SQL query to compute aggregate metrics for an interesting statistic on
the incoming data.
 Write a SQL query using an anomaly detection function.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2A: Start Kinesis Data Analytics App
 Navigate to the Amazon
Kinesis dashboard.
 Choose the Kinesis analytics
application.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2A: Start Kinesis App
Choose Go to SQL editor. On the next screen, choose Yes, start
application.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
View Sample Records in Kinesis App
Review sample records delivered to the source stream
(SOURCE_SQL_STREAM_001).
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Kinesis Analytics Application Metadata
Note that Kinesis adds metadata to each record being sent that was shown in
the formatted record sample:
- The ROWTIME represents the time the application read the record, and is a
special column used for time series analytics. This is also known as a the
process time.
- The APPROXIMATE_ARRIVAL_TIME is the time the delivery stream received
the record. This is also known as ingest time.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Kinesis Data Analytics: How Streaming App Works
 Writing SQL over streaming data using Kinesis Data Analytics follows a
two part model:
1. Create an in-application stream for storing intermediate SQL results. An in-
application stream is like a SQL table, but is continuously updated.
2. Create a PUMP that will continuously read FROM one in-application stream
and INSERT INTO a target in-application stream.
DESTINATION_STREAM
Part 1
AGGREGATE_STREAM
Part 2
SOURCE_SQL_STREAM_001
Source Stream
OUTPUT_PUMP
AGGREGATE_PUMP
Send to Amazon Redshift
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2B: Calculate an Aggregate Metric
Calculate a count using a tumbling window and a GROUP BY clause.
Atumbling window is similar to
a periodic report, where you
specify your query and a time
range, and results are emitted at
the end of a range. (e.g.,COUNT
number of items by key for 10
seconds)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2B: Calculate an Aggregate Metric
Tumbling
Sliding
Custom
• Fixed size and non-overlapping
• Use FLOOR() or STEP() function (coming soon) in a GROUP BY statement
• Fixed size and overlapping; row boundaries are determined when new
rows enter thewindow
• Use standard OVERand WINDOW clause (i.e., count (col) OVER
(RANGE INTERVAL‘5’ MIN)
• Not fixed size and overlapping; row boundaries are determined by
conditions
• Implementations vary, but typically require two steps (step 1 –
Identify boundaries, step 2 – Perform computation)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2B: Calculate an Aggregate Metric
 The window is defined by the following statement in the SELECT statement. Note
that the ROWTIME column is implicitly included in every stream query, and
represents the processing time of the application.
 This is known as a tumbling window. Tumbling windows are always included in a
GROUP BY clause and use a STEP function. The STEP function takes an interval
to product the periodic reports.
 You can also use the SQL FLOOR() function to achieve the same thing.
STEP(source_sql_stream_001.ROWTIME BY INTERVAL '10' SECOND)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2B: Calculate an Aggregate Metric
Create an aggregate_stream using SQL command found in the Kinesis Data
Analytics SQL file in the Student Resources section of your lab. Copy and paste
the SQL in the SQL editor underneath the DESTINATION_SQL_STREAM DDL.
Choose Save and run SQL.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2C: Anomaly Detection
 Kinesis Data Analytics includes some advanced algorithms that are extensions
to the SQL language. These include approximate count distinct
(hyperloglog), approximate top K (space saving), and anomaly detection
(random cut forest).
 The random cut forest algorithm detects anomalies in real time on multi-
dimensional datasets. You pass the algorithm n number of numeric fields and
the algorithm produces an anomaly score on your stream data. Higher scores
are more anomalous.
 The minimum anomaly score is 0 and the maximum is log2 s, where s is the
subsample size parameter passed to random cut forest (third parameter). You
will have to try the algorithm on your data to get a feel for the anomaly score, as
the score is data-dependent.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 2C: Anomaly Detection
Create an anomaly_stream using the SQL command found in the
Kinesis Data Analytics SQL file in the Student Resources section of
your lab. Append the SQL in your SQL editor. Choose Save and run
SQL.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review: In-Application SQL streams
 Your application has multiple in-application SQL streams, including
DESTINATION_SQL_STREAM, AGGREGATE_STREAM, and
ANOMALY_STREAM. These in-application streams are like SQL tables that are
continuously updated.
 What else is unique about an in-application stream aside from its continuous
nature?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 3
Deliver Streaming Results to
Amazon Redshift
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer UI Amazon Kinesis
Data Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying
of web logs
AWSGlue
Extract metadata,create
tables and transform
Web logs from CSVto
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 3: Deliver Data to Amazon Redshift using
Amazon Kinesis Data Firehose
 Time: 5 min.
 We are going to:
A. Connect to Amazon Redshift cluster and create a table to hold web
logs data.
B. Update the Kinesis Analytics application to send data to Amazon
Redshift, via the Firehose delivery stream.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 3A: Connect to Amazon Redshift
 Or use any JDBC/ODBC/libpq client
- Aginity Workbench for Amazon
Redshift
- SQL Workbench/J
- DBeaver
- DataGrip
 If you use the above SQL clients, the
user name and password are in your
qwikLABS console at the bottom. You will
find the cluster endpoint, database name
(logs) in the qwikLABS console as well.
 You can connect with pgweb.
 Already installed and configured for the
Amazon Redshift cluster.
 Just navigate to pgweb, and start
interacting.
Note: From the qwikLABS console, open the
pgWeb link in a new window.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 3B: Create a Table in Amazon Redshift
 Create table weblogs to capture the incoming data from a Firehose delivery stream.
Note: You can download Amazon Redshift SQL code from qwikLabs Student Resource
section (choose Open Redshift SQL).
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 3C: Deliver Data to Amazon Redshift Using
Firehose
Update Kinesis Analytics application to send data to Firehose delivery stream.
Amazon Kinesis Data Firehose delivers the streaming data to Amazon Redshift.
1. Go to the Kinesis Data Analytics console.
2. Choose the Amazon Redshift delivery stream as destination, and
choose the edit icon (the pencil icon in the figure below).
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 3C: Deliver Data to Amazon Redshift Using
Kinesis Data Firehose
 Validate your destination
1. Choose the Kinesis Data Firehose qls-xxxxxxx-
RedshiftDeliveryStream-xxxxxxxx delivery stream.
2. Keep the default for Connect in-application stream.
3. Choose CSV as the Output format.
4. Select Choose from IAM roles that Kinesis Analytics can assume.
5. Click Save and continue.
 It will take about 1–2 minutes for everything to be updated and for data to start
appearing in Amazon Redshift.
Activity 3C: Deliver Data to Amazon Redshift using
Kinesis Data Firehose
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review: Amazon Redshift Test Queries
 Find distribution of response codes over days (copy SQL from Amazon Redshift
SQL file)
 Count the number of 404 response codes
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review: Amazon Redshift Test Queries
Show all requests paths with status “PAGE NOT FOUND"
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Extract, Transform, and Load (ETL)
with AWS Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Fully managed ETL (Extract, Transform, and Load)
Service
AWS Glue
• Categorize your data, clean it, enrich it, and reliably
move it between various data stores
• Once cataloged, your data is immediately
searchable and queryable across your data silos
• Simple, flexible, and cost-effective
• Serverless; runs on a fully managed, scale-outSpark
environment
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue Components
• Discover and organize your data in various databases, data
warehouses, and data lakes.
• Focus on the writing transformations.
• Generate code through a wizard.
• Runs jobs in Spark containers. Automatic scaling based
on SLA.
• AWS Glue is serverless. Only pay for the resourcesyou
consume.
Write your own code
Job Authoring
Data Catalog
Job Execution
AWS Glue – How It Works
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4
Transform Web Logs to Parquet
Format Using AWS Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer UI Amazon Kinesis
Data Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
AWSGlue
Extract metadata,create
tables and transform
web logs from CSVto
parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity: Catalog and Perform ETL on Weblogs
Time: 40 min.
We are going to:
A. Discover and catalog the web log data deposited into the S3 bucket
using AWS Glue Crawler.
B. Transform web logs to parquet format with theAWS Glue ETL
job authoring tool.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Discover Dataset with AWS Glue
We use the AWS Glue crawler to extract data and metadata. From the
AWS Management Console, select AWS Glue Service, and on the
next screen choose Get Started.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Add Crawler Using AWS Glue
On the left, choose Crawlers, and choose Add crawler.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
 Specify a name for the crawler. Choose the folder icon to select the
data store on Amazon S3.
 Choose Next.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
 Provide the S3 path location where the weblogs are deposited.
Navigate to the S3 path that contains the word logs, and select the
raw folder.
 Choose Select.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
Choose Next on the next screen to not add another data store.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
In the IAM Role section, select Create an IAM role, add default to the
IAM role, and choose Next.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Add Crawler in AWS Glue
Choose Run on demand to run the crawler now, and choose Next.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Add Crawler with AWS Glue
On the next screen, choose Add database to add a database (use weblogdb for
the database name).
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Add Crawler with AWS Glue
Review and choose Finish to create a crawler.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Add Crawler with AWS Glue
Choose the Run it now link to run the crawler.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Add Crawler with AWS Glue
Crawler shows a Ready status when it is finished running.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Table creation in AWS Glue
 Observe that the crawler has created a table for your dataset.
 The crawler automatically classified the dataset as combinedapache log
format.
 Click the table to look at the properties.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Table Creation in AWS Glue
 AWS Glue used the GrokSerDe (Serializer/Deserializer) to correctly
interpret the web logs.
 You can choose the View partitions link to look at the partitions in the
dataset.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4A: Create ETL Job in AWS Glue
 With the dataset cataloged and table created, we are now ready to convert the
web logs based Apache combined log format to a more optimal parquet format for
querying.
 Choose Add job to begin creating the ETL job.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
 In the Job properties window, specify
the job name.
 In this example, we will create a new ETL
script.
 AWS Glue automatically chooses the name
of the script file and the path where the
script will be persisted.
 For the Temporary directory, specify an S3
temporary directory in your lab account (use
the s3://<stack>-logs-<account>-us-west-
2/ bucket, and append the folder name,
temp).
 The path should look like:
s3://<stack>-logs-<account>-us-west-
2/temp/
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
 Do not choose Next yet.
 Copy the temporary path to a text editor, modify the path as follows, and
save it in a file (we will need this path for storing parquet files).
s3://<stack>-logs-<account>-us-west-2/weblogs/processed/parquet
E.g.,
s3: / / qls- 108881- f75177905b7b5b0e- logs- XXXXXXXX3638- us- west-
2/weblogs/processed/parquet
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
 Expand the Script libraries and job
parameters section, and increase the DPUs to
20.
 Let’s pass a job parameter to send the S3 path
where parquet files will be deposited.
Specify the following values for Key and
Value:
 Key: --parquet_path (notice the 2 hyphens
at the beginning)
 Value: s3://<stack>-logs-<account>-us-
west- 2/weblogs/processed/parquet/
Note: The value is the S3 path we stored in
the previous section.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
Choose Next in the following screen, and choose Finish to complete the job
creation.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETLJob in AWS Glue
 Close Script Editor tips
window (if it appears).
 In the Glue Script Editor,
copy the ETL code by
Choosing the Open Glue
ETL Code link in
Student Resources.
 Ensure that the database
name (db_name) and table
name reflect the database
and table name created by
the AWS Glue crawler.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
Choose Save and then Run job to execute your ETL. Choose Save and run
job in the next window.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
 Run job to continue. This might take a few minutes. When the job finishes,
weblogs will be transformed to parquet format.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Interactive Querying with Amazon
Athena
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Interactive Query Service
 Query directly from Amazon S3
 Use ANSI SQL
 Serverless
 Multiple data formats
 Cost effective
Amazon Athena
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDLfunctionality
Complex data types
Multitude of formats
Supports data partitioning
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Comparing Performance and Cost Savings for
Compression and Columnar Format
Dataset Size onAmazon Query run time Data scanned Cost
S3
Data stored as text 1 TB
files
236 seconds 1.15 TB $5.75
Data stored in
Apache Parquet 130 GB 6.78 seconds 2.51 GB $0.013
format*
Savings / 87% less with 34x faster 99% less data 99.7% savings
Speed up parquet scanned
(*compressed using Snappy compression)
https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5
Interactive Querying with Amazon
Athena
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer
UI
Amazon Kinesis
Data Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying
of web logs
AWSGlue
Extract metadata,create
tables and transform
Web logs from CSVto
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity: Interactive Querying with Amazon Athena
Time: 15 min.
We are going to:
A. Create a table over the processed weblogs in Amazon S3. These
are the parquet files created by the AWS Glue ETLjob in the
previoussection.
B. Run interactive queries on the parquet weblogs.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
 From the AWS Management Console, search for Athena, and choose the
service.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
 In the Analytics section, choose Amazon Athena, and on the next page,
choose Get Started.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
 Close the running the Athena tutorial.
 Close any other open tutorial windows.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
 We are now ready to create a table in Amazon Athena. But before we do that,
we need to get the S3 bucket location where the AWS Glue job delivered parquet
files. Choose Services, and in the Storage section, choose S3.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
Locate the bucket that looks like qls-<stackname>-logs-#####-us-west-2.
Navigate to the parquet folder. Copy the name of the bucket into a text editor.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
 Go back to the Athena console.
 Let’s create a table in Athena on the parquet dataset created by AWS Glue.
 In the Athena console, in the database drop-down menu, choose weblogdb.
 Type the SQL command (found by choosing the Open Athena SQL link in the
Student Resources section of qwiklabs) to create a table.
 Make sure to replace the <your-parquet-path> with the bucket location for the
parquet files you copied in the previous step.
Screen shot on next slide.
Activity 5B: Working with Amazon Athena
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5B: Working with Amazon Athena
 The SQL ddl in the
previous step creates a
table in Athena based on
the parquet files we
created with the AWS
Glue ETL job.
 Select weblogdb from
the database section,
and click the three
stacked dots icon to
sample a few rows of the
Amazon S3 data.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5C: Interactive Querying with Amazon Athena
Run interactive queries (copy SQL queries from Athena SQL in Student
Resources), and see the results on the console.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 5C: Interactive Querying with Athena
 Optionally, you can save the results of a query to CSV by choosing the file icon
on the Results pane.
 You can also view the results of previous queries or queries that may take some
time to complete. Choose History. Then, either search for your query or
choose View or Download to view or download the results of previous
completed queries. This also displays the status of queries that are currently
running.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review: Amazon Athena Interactive Queries
 Query results are also stored in Amazon S3 in a bucket called aws-athena-
query-results-ACCOUNTID-REGION.
 Where can you change the default location in the console?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Processing with Amazon EMR
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EMR Service
Applications
Hive, Pig, Spark SQL/Streaming/ML, Flink, Mahout, Sqoop
HBase/
Phoenix
Presto
MXNet
Batch
MapReduce
Interactive
Tez
In Memory
Spark
Streaming
Flink
YARN
Cluster resource management
Storage
Amazon S3 (EMRFS), HDFS
Ganglia - Monitoring
Hue - SQL
Zeppelin - Notebooks
Livy - Job server
Connectorsto
AWSservices
Amazon EMR
service
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
On-Cluster UIs
Manage applicationsNotebooks
SQL editor, workflow designer,
metastore browser
Design and execute
queries and workloads
And more using
bootstrap actions!
The Hadoop Ecosystem Can Run in Amazon
EMR
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Deep Learning with GPU
Instances
Use P3 instances
with GPUs
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Easy to Use Spot Instances
Meet SLA at a predictable cost
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Exceed SLA at a lowercost
Spot Instances
for task nodes
Up to 90% off
Amazon EC2
on-demand
pricing
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon S3 as Your Persistent Data Store
 Separate compute and storage
 Resize and shut down Amazon EMR
clusters with no data loss
 Point multiple Amazon EMR clusters
at the same data in Amazon S3
EMR
EMR
Amazon
S3
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
EMRFS Makes It Easier to Use Amazon S3
 Better performance and error-handling options
 Transparent to applications – Use “s3://”
 Consistent view
- For consistent list and read-after-write for newputs
 Support for Amazon S3 server-side and client-side encryption
 Faster listing using EMRFS metadata
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Apache Spark
 Fast, general-purpose engine for large-
scale data processing
 Write applications quickly in Java, Scala,
or Python
 Combine SQL, streaming, and complex
analytics
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Apache Zeppelin
 Web-based notebook for interactive
analytics
 Multiple language backend
 Apache Spark integration
 Data visualization
 Collaboration
 https://zeppelin.apache.org/
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6
Interactive Analysis Using Amazon
EMR
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer
UI
Amazon
Kinesis
Data
Firehose
AmazonKinesis
Data Analytics
AmazonKinesis
Data
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
AWSGlue
Extract metadata,create
tables and transform
Web logs from CSVto
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6: Process and Query Data with Amazon EMR
 Time: 20 minutes
 We are going to:
A. Use a Zeppelin notebook to interact with an Amazon EMR cluster,
B. Process the data in Amazon S3 using Apache Spark.
C. Query the data processed in the earlier stage, and create simple
charts.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6A: Open the Zeppelin Interface
 Copy the Zeppelin endpoint in Student
Resources section in qwiklabs.
 In the Student Resources section,
select the Open Zeppelin Notebook
link to open the Zeppelin link in a new
window.
 Download the file (or copy and save it
to file with .json extension).
 Import the notebook using the Import
note link on the Zeppelin interface.
Note: Disconnect from VPN, or the
page will not load.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6A: Open the Zeppelin Interface
 Use the s3://<stack>-logs-<account>-us-west-
2/processed/parquet bucket where the processed parquet files
weredeposited.
 Run the paragraph.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6B: Run the Notebook
 Enter the S3 bucket name where the parquet files are stored. The bucket
name begins with <stack>-*-logs-#####-region.
 Execute step 1
- Enter the bucket name (<stack>-*-logs-##########-us-west-2).
 Execute step 2
- Create a data frame with the parquet files from the Glue ETL job.
 Execute step 3
- Sample a few rows.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6B: Run the Notebook
 Execute step 4 to process the data
• Notice how the AGENT field consists of the BROWSER at the
beginning of the column value. Let’s extract the browser from that.
• Create a UDF that will extract the browser part and add it to the
dataframe.
• Print the new data frame.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 6B: Run the Notebook
 Execute step 6
- Register the data frame as a temporary table.
- Now you can run SQL queries on the temporary tables.
 Execute the next three steps, and observe the charts
created.
- What did you learn about the dataset?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review : Interactive Analysis Using Amazon EMR
 You just learned how to process and query data using Amazon EMR
with Apache Spark.
 Amazon EMR has many other frameworks available for you to use:
- Hive, Presto, Flink, Pig, MapReduce
- Hue, Oozie, HBase
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Optional Exercise:
Data Visualization with Amazon
QuickSight
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon QuickSight
 Fast, easy interactive analytics
for anyone, everywhere
 Ease of use targeted at business users
 Blazing fast performance powered by
SPICE
 Broad connectivity with AWS data
services, on-premises data, files,
and business applications
 Cloud-native solution that scales
automatically
 1/10th the cost of traditional BI solutions
 Create, share, and collaborate with
anyone in your organization, on the web
or on mobile
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Connect, SPICE, Analyze
Amazon QuickSight enables you to connect to data from a wide variety of AWS,
third-party, and on-premises sources and import it to SPICE or query directly. Users
can then easily explore, analyze, and share their insights with anyone.
Amazon RDS
Amazon S3
Amazon
Redshift
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7
Visualize Results in Amazon
QuickSight
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Your Application Architecture
Kinesis Producer
UI
Kinesis
Data
Firehose
KinesisData Analytics
KinesisData Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon
Redshift
Raw web logs from
Kinesis Data
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
AWSGlue
Extract metadata,create
tables and transform
Web logs from CSVto
Parquet
Amazon S3
Bucket
Transformed web
logs from AWSGlue
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7: Visualization with Amazon QuickSight
We are going to:
A. Register for an Amazon QuickSight account.
B. Connect to the Amazon Redshift cluster.
C. Create visualizations for analysis to answer questions like:
A. What are the most common HTTP requests, and how
successful (response code of 200) are they?
B. Which are the most requested URIs?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7A: Amazon QuickSight Registration
 Go to the AWS Management
Console, and in the
Analytics section, choose
QuickSight.
 On the next screen,
choose Signup.
 Make sure the subscription
type is Standard, and on the
next screen, choose
Continue.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7A: Amazon QuickSight Registration
 On the Subscription type page, type
in the account name. The QuickSight
account name is the AWS account
number on the qwikLabs console.
 Type in your email address.
 Select US West region.
 Select S3 (all buckets).
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7A: QuickSight Registration
 If a pop box to choose S3 buckets
appears, choose Select buckets.
 Choose Go to Amazon
QuickSight.
 Dismiss the welcome screen.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7B: Connect to the Data Source
 Choose Manage data and
then choose New dataset to
create a new dataset in
Amazon QuickSight.
 Choose Redshift Auto-
discovered as the data source.
Amazon QuickSight
autodiscovers databases
associated with your AWS
account, in this case the
Amazon Redshift database.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7B: Connect to Amazon Redshift
Note: Use dbadmin as the user
name. You can get the Amazon
Redshift database password from
qwikLABS by navigating to the
Connection details section (see
below).
Activity 7C: Choose Your Weblogs Amazon Redshift
Table
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7D: Ingest Data into SPICE
 SPICE is Amazon QuickSight's in-
memory optimized calculation
engine, designed specifically for
fast, interactive data visualization.
 You can improve the performance of
database datasets by importing the
data into SPICE instead of using a
direct query to the database.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Activity 7E: Creating Your First Analysis
 What are the most requested
HTTP request types and their
corresponding response codes
for this site?
 Simply select request,
response, and let
AUTOGRAPH create the
optimal visualization
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Review – Creating Your Analysis
 Exercise: Add a visual to demonstrate which URI is the most requested.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session survey in
the summit mobile app.
Submit Session Feedback
1. Tap the Schedule icon. 2. Select the session
you attended.
3. Tap Session
Evaluation to submit your
feedback.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...Amazon Web Services
 
AWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAmazon Web Services
 
SRV322 Increase the Value of Video with ML & Media Services
 SRV322 Increase the Value of Video with ML & Media Services SRV322 Increase the Value of Video with ML & Media Services
SRV322 Increase the Value of Video with ML & Media ServicesAmazon Web Services
 
SRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSSRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSAmazon Web Services
 
SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ...
 SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ... SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ...
SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ...Amazon Web Services
 
SID304 Threat Detection and Remediation with Amazon GuardDuty
 SID304 Threat Detection and Remediation with Amazon GuardDuty SID304 Threat Detection and Remediation with Amazon GuardDuty
SID304 Threat Detection and Remediation with Amazon GuardDutyAmazon Web Services
 
Achieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech TalksAchieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech TalksAmazon Web Services
 
SRV315 Building Enterprise-Grade Serverless Apps
 SRV315 Building Enterprise-Grade Serverless Apps SRV315 Building Enterprise-Grade Serverless Apps
SRV315 Building Enterprise-Grade Serverless AppsAmazon Web Services
 
Secure your AWS Account and your Organization's Accounts
Secure your AWS Account and your Organization's Accounts Secure your AWS Account and your Organization's Accounts
Secure your AWS Account and your Organization's Accounts Amazon Web Services
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Amazon Web Services
 
SID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CASID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CAAmazon Web Services
 
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...Amazon Web Services
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case StudyAmazon Web Services
 
Building Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWSBuilding Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWSAmazon Web Services
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Amazon Web Services
 
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration Amazon Web Services
 
Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...
Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...
Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Amazon Web Services
 

What's hot (20)

AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
 
AWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day Israel
 
SRV322 Increase the Value of Video with ML & Media Services
 SRV322 Increase the Value of Video with ML & Media Services SRV322 Increase the Value of Video with ML & Media Services
SRV322 Increase the Value of Video with ML & Media Services
 
SRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKSSRV318 Running Kubernetes with Amazon EKS
SRV318 Running Kubernetes with Amazon EKS
 
SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ...
 SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ... SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ...
SRV204 Creating Your Virtual Data Center: VPC Fundamentals and Connectivity ...
 
SID304 Threat Detection and Remediation with Amazon GuardDuty
 SID304 Threat Detection and Remediation with Amazon GuardDuty SID304 Threat Detection and Remediation with Amazon GuardDuty
SID304 Threat Detection and Remediation with Amazon GuardDuty
 
Achieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech TalksAchieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech Talks
 
SRV315 Building Enterprise-Grade Serverless Apps
 SRV315 Building Enterprise-Grade Serverless Apps SRV315 Building Enterprise-Grade Serverless Apps
SRV315 Building Enterprise-Grade Serverless Apps
 
Secure your AWS Account and your Organization's Accounts
Secure your AWS Account and your Organization's Accounts Secure your AWS Account and your Organization's Accounts
Secure your AWS Account and your Organization's Accounts
 
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
Set Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tool...
 
SID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CASID305 AWS Certificate Manager Private CA
SID305 AWS Certificate Manager Private CA
 
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
Architecting ASP.NET Core Microservices Applications on AWS (WIN401) - AWS re...
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 
Deep dive - AWS Fargate
Deep dive - AWS FargateDeep dive - AWS Fargate
Deep dive - AWS Fargate
 
Building Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWSBuilding Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWS
 
Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics Big Data and Alexa_Voice-Enabled Analytics
Big Data and Alexa_Voice-Enabled Analytics
 
Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration Containerize Legacy .NET Framework Web Apps for Cloud Migration
Containerize Legacy .NET Framework Web Apps for Cloud Migration
 
Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...
Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...
Increase the Value of Video with ML & Media Services - SRV322 - New York AWS ...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
 

Similar to BDA309 Build Your First Big Data Application on AWS

ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317Amazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Amazon Web Services
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Amazon Web Services
 
WildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopWildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopAmazon Web Services
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your SolutionsAmazon Web Services
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)Adir Sharabi
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Amazon Web Services
 
Building your first GraphQL API with AWS AppSync
Building your first GraphQL API with AWS AppSyncBuilding your first GraphQL API with AWS AppSync
Building your first GraphQL API with AWS AppSyncAmazon Web Services
 
Building your First GraphQL API with AWS AppSync
Building your First GraphQL API with AWS AppSyncBuilding your First GraphQL API with AWS AppSync
Building your First GraphQL API with AWS AppSyncAmazon Web Services
 
Nader Dabit - Intro to AWS AppSync.pdf
Nader Dabit - Intro to AWS AppSync.pdfNader Dabit - Intro to AWS AppSync.pdf
Nader Dabit - Intro to AWS AppSync.pdfAmazon Web Services
 
Serverless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamServerless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamBoaz Ziniman
 
善用 GraphQL 與 AWS AppSync 讓您的 Progressive Web App (PWA) 加速進化 (Level 200)
善用  GraphQL 與 AWS AppSync 讓您的  Progressive Web App (PWA) 加速進化 (Level 200)善用  GraphQL 與 AWS AppSync 讓您的  Progressive Web App (PWA) 加速進化 (Level 200)
善用 GraphQL 與 AWS AppSync 讓您的 Progressive Web App (PWA) 加速進化 (Level 200)Amazon Web Services
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Amazon Web Services
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
Developing Serverless Application on AWS
Developing Serverless Application on AWSDeveloping Serverless Application on AWS
Developing Serverless Application on AWSAmazon Web Services
 
Visualise and Voice-Enable Your Security
Visualise and Voice-Enable Your SecurityVisualise and Voice-Enable Your Security
Visualise and Voice-Enable Your SecurityAmazon Web Services
 

Similar to BDA309 Build Your First Big Data Application on AWS (20)

ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317ABD317_Building Your First Big Data Application on AWS - ABD317
ABD317_Building Your First Big Data Application on AWS - ABD317
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
WildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopWildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing Workshop
 
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
Big Data Meets AI - Driving Insights and Adding Intelligence to Your Solutions
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
 
Building your first GraphQL API with AWS AppSync
Building your first GraphQL API with AWS AppSyncBuilding your first GraphQL API with AWS AppSync
Building your first GraphQL API with AWS AppSync
 
Building your First GraphQL API with AWS AppSync
Building your First GraphQL API with AWS AppSyncBuilding your First GraphQL API with AWS AppSync
Building your First GraphQL API with AWS AppSync
 
Nader Dabit - Intro to AWS AppSync.pdf
Nader Dabit - Intro to AWS AppSync.pdfNader Dabit - Intro to AWS AppSync.pdf
Nader Dabit - Intro to AWS AppSync.pdf
 
Serverless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamServerless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO Amsterdam
 
善用 GraphQL 與 AWS AppSync 讓您的 Progressive Web App (PWA) 加速進化 (Level 200)
善用  GraphQL 與 AWS AppSync 讓您的  Progressive Web App (PWA) 加速進化 (Level 200)善用  GraphQL 與 AWS AppSync 讓您的  Progressive Web App (PWA) 加速進化 (Level 200)
善用 GraphQL 與 AWS AppSync 讓您的 Progressive Web App (PWA) 加速進化 (Level 200)
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Developing Serverless Application on AWS
Developing Serverless Application on AWSDeveloping Serverless Application on AWS
Developing Serverless Application on AWS
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Visualise and Voice-Enable Your Security
Visualise and Voice-Enable Your SecurityVisualise and Voice-Enable Your Security
Visualise and Voice-Enable Your Security
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

BDA309 Build Your First Big Data Application on AWS

  • 1. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Allan MacInnis Solutions Architect, Amazon Web Services Ryan Nienhuis Sr. PM, Amazon Kinesis, Amazon Web Services BDA309 Building Your First Big Data Application on AWS
  • 2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Workshop Objectives  Build an end-to-end analytics application on AWS  Implement batch and real-time layers in a single application  Explore AWS analytics portfolio for other use cases
  • 3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Why Analytics on AWS?  Unmatched durability, and availability at EB scale  Best security, compliance, and audit capabilities  Object-level controls for fine-grain access  Fastest performance by retrieving subsets of data  The most ways to bring data in  2x as many integrations with partners  Analyze with broadest set of analytics & ML services Analytics Data lake on AWS Real-time data movement On-premises data movement Machine learning
  • 4. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to Amazon S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata, create tables and transform web logs from CSVto parquet Amazon S3 Bucket Transformed web logs from AWS Glue
  • 5. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. What Is qwikLABS?  Provides access to AWS services for this workshop  No need to provide a credit card  Automatically deleted when you’re finished
  • 6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Sign in and Start the Lab Once the lab is started, you will see a lab setup progress bar. It takes ~10 min for the lab to be setup.
  • 7. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Navigating qwikLABS • Student resources: Scripts for your labs • Open console: Opens AWSManagement Console • Additional connection details: Links to different interfaces
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Everything You Need for the Lab  Open the AWS Management Console, log in, and verify that the following AWS resources are created:  One Amazon Kinesis Data Analytics application  One Kinesis Data Analytics preprocessing Lambda function  Two Amazon Kinesis Data Firehose delivery streams  One Amazon EMR cluster  One Amazon Redshift cluster  Sign up (later) for:  Amazon QuickSight
  • 9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Amazon Kinesis Data Streams Amazon Kinesis Data Analytics Amazon Kinesis Data Firehose Build custom applications that process and analyze streaming data Easily process and analyze streaming data with standard SQL Easily load streaming data into AWS
  • 10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Data Streams • Easy administration and low cost • Build real-time applications with framework of choice • Secure, durable storage
  • 11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Data Firehose  Zero administration and seamless elasticity  Direct-to-data store integration  Serverless, continuous data transformations
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis – Kinesis Data Firehose vs. Kinesis Data Streams Amazon Kinesis Data Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Data Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift, and Amazon Elasticsearch, and a data latency of 60 seconds orhigher.
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1 Collect Logs Using a Kinesis Data Firehose Delivery Stream
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to Amazon S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata, create tables and transform web logs from CSVto parquet Amazon S3 Bucket Transformed web logs from AWS Glue
  • 15. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.  Time: 5 min. We are going to:  Write to a Kinesis Firehose delivery stream. We’ll simulate writing transformed Apache web logs to a Firehose delivery stream that is configured to deliver data into an S3 bucket.  There are many different libraries that can be used to write data to a Kinesis Date Firehose delivery stream. One popular option is called the Amazon Kinesis Agent. Collect Logs with a Kinesis Data Firehose Delivery Stream
  • 16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.  Amazon Kinesis Agent  Standalone Java application to collect and send data to Kinesis Data Firehose  Continuously monitors set of files  Handles file rotation, check-pointing, and retry on failures  Emits CloudWatch metrics  Pre-process records parsed from monitored files Collect Logs with a Kinesis Data Firehose Delivery Stream
  • 17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Collect Logs with a Kinesis Data Firehose Delivery Stream For example, the agent can transform an Apache web log to JSON. From: To: {"HOST" : "125.166.52.103", "IDENT" : null, "AUTHUSER" : null, "DATETIME" : "08/Mar/2017:17:06:44 -08:00", "REQUEST" : "GET /explore", "RESPONSE" : 200, "BYTES" : 2503, "REFERRER" : null, "AGENT" : "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2; Trident/5.0)“} 125.166.52.103 - - [08/Mar/2017:17:06:44 -08:00] "GET /explore“ 200 2503 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2; Trident/5.0)“}
  • 18. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. So that we don’t have to install or set up software on your machine, we are going to use a utility called the Kinesis Data Generator to simulate using the Amazon Kinesis agent. The Kinesis Data Generator can populate a Firehose delivery stream using a template and is simple to set up. Let’s get started! Collect Logs with a Kinesis Data Firehose Delivery Stream
  • 19. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG Qwiklabs has already created and set up the Kinesis Data Firehose delivery stream for us. All we have to do is start writing data to it. Go to this Kinesis Data Generator (KDG) Help Section at http://tinyurl.com/kinesispublisher Which will redirect to: https://s3.amazonaws.com/kinesis-data-producer-test/help.html
  • 20. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG  Choose Create Amazon Cognito User with AWS CloudFormation.  This link takes you to a service called AWS CloudFormation. AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.  We use AWS CloudFormation to create the necessary user credentials to use the Kinesis Data Generator.
  • 21. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG On the next screen, choose Next. Here, we are using a template stored in an Amazon S3 bucket to create the necessary credentials.
  • 22. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG Specify a user name and password (and remember them!), and choose Next. This user name and password will be used to sign into the Kinesis Data Generator. The password must be at least 6 alpha-numeric characters, and contain at least one number.
  • 23. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG  We use a service called Amazon Cognito to create these credentials.  Amazon Cognito lets you easily add a user sign-up and sign-in to your mobile and web apps. With Amazon Cognito, you also have the options to authenticate users through social identity providers such as Facebook, Twitter, or Amazon, with SAML identity solutions, or by using your own identity system.
  • 24. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG The next screen has some additional options for the AWS CloudFormation stack, which are not needed. Choose Next.
  • 25. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG This screen is a review screen so you can verify that you have selected the correct options. When you are ready, choose Acknowledge, and choose Create.
  • 26. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG You are taken to a screen that shows the stack creation process. (You may have to refresh the page.) In approximately one minute, the stack will complete, and, under Status, you will see CREATE_COMPLETE. Once this occurs, a) select the template, b) select the outputs tab, and c) choose the link to navigate to your very own Kinesis Data Generator (hosted on Amazon S3).
  • 27. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG  Log in using your user name and password.  Select the region us-west-2” (Oregon).  Select the delivery stream with name (qls-<somerandomnumber>- FirehoseDeliveryStream-<somerandomnumber>-11111111111).  Specify a data rate (Records per second). Choose a number less than 10.
  • 28. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG We chose a number less than 10 because everyone is on the same Wi-Fi network, and we want to be sure we don’t use all the bandwidth. Additionally, if you are not plugged in, you may run into a battery issues at a higher rate. For the Record Template, you should use the template Apache Combined Log Template, in the Student Resources section. The template should look like the following:
  • 29. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 1A: Working with the KDG Choose Send Data to Kinesis.
  • 30. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review: Monitoring Your Delivery Stream  Go to the Amazon CloudWatch Metrics Console and search IncomingRecords. Choose this metric for your Firehose delivery stream, and for Sum, choose 1 Minute.  What are the most important metrics to monitor?
  • 31. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2 Real-Time Data Processing Using Kinesis Data Analytics
  • 32. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata,create tables and transform web logs from CSVto parquet Amazon S3 Bucket Transformed web logs from AWSGlue
  • 33. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Kinesis Data Analytics • Powerful real time applications • Easy to use, fully managed • Automatic elasticity
  • 34. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Kinesis Data Analytics Applications Easily write SQLcode to process streamingdata Connect to streamingsource Continuously deliver SQLresults
  • 35. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Process Data Using Kinesis Data Analytics  Time: 20 min. We are going to:  Write a SQL query to compute aggregate metrics for an interesting statistic on the incoming data.  Write a SQL query using an anomaly detection function.
  • 36. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2A: Start Kinesis Data Analytics App  Navigate to the Amazon Kinesis dashboard.  Choose the Kinesis analytics application.
  • 37. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2A: Start Kinesis App Choose Go to SQL editor. On the next screen, choose Yes, start application.
  • 38. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. View Sample Records in Kinesis App Review sample records delivered to the source stream (SOURCE_SQL_STREAM_001).
  • 39. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Kinesis Analytics Application Metadata Note that Kinesis adds metadata to each record being sent that was shown in the formatted record sample: - The ROWTIME represents the time the application read the record, and is a special column used for time series analytics. This is also known as a the process time. - The APPROXIMATE_ARRIVAL_TIME is the time the delivery stream received the record. This is also known as ingest time.
  • 40. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Kinesis Data Analytics: How Streaming App Works  Writing SQL over streaming data using Kinesis Data Analytics follows a two part model: 1. Create an in-application stream for storing intermediate SQL results. An in- application stream is like a SQL table, but is continuously updated. 2. Create a PUMP that will continuously read FROM one in-application stream and INSERT INTO a target in-application stream. DESTINATION_STREAM Part 1 AGGREGATE_STREAM Part 2 SOURCE_SQL_STREAM_001 Source Stream OUTPUT_PUMP AGGREGATE_PUMP Send to Amazon Redshift
  • 41. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2B: Calculate an Aggregate Metric Calculate a count using a tumbling window and a GROUP BY clause. Atumbling window is similar to a periodic report, where you specify your query and a time range, and results are emitted at the end of a range. (e.g.,COUNT number of items by key for 10 seconds)
  • 42. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2B: Calculate an Aggregate Metric Tumbling Sliding Custom • Fixed size and non-overlapping • Use FLOOR() or STEP() function (coming soon) in a GROUP BY statement • Fixed size and overlapping; row boundaries are determined when new rows enter thewindow • Use standard OVERand WINDOW clause (i.e., count (col) OVER (RANGE INTERVAL‘5’ MIN) • Not fixed size and overlapping; row boundaries are determined by conditions • Implementations vary, but typically require two steps (step 1 – Identify boundaries, step 2 – Perform computation)
  • 43. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2B: Calculate an Aggregate Metric  The window is defined by the following statement in the SELECT statement. Note that the ROWTIME column is implicitly included in every stream query, and represents the processing time of the application.  This is known as a tumbling window. Tumbling windows are always included in a GROUP BY clause and use a STEP function. The STEP function takes an interval to product the periodic reports.  You can also use the SQL FLOOR() function to achieve the same thing. STEP(source_sql_stream_001.ROWTIME BY INTERVAL '10' SECOND)
  • 44. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2B: Calculate an Aggregate Metric Create an aggregate_stream using SQL command found in the Kinesis Data Analytics SQL file in the Student Resources section of your lab. Copy and paste the SQL in the SQL editor underneath the DESTINATION_SQL_STREAM DDL. Choose Save and run SQL.
  • 45. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2C: Anomaly Detection  Kinesis Data Analytics includes some advanced algorithms that are extensions to the SQL language. These include approximate count distinct (hyperloglog), approximate top K (space saving), and anomaly detection (random cut forest).  The random cut forest algorithm detects anomalies in real time on multi- dimensional datasets. You pass the algorithm n number of numeric fields and the algorithm produces an anomaly score on your stream data. Higher scores are more anomalous.  The minimum anomaly score is 0 and the maximum is log2 s, where s is the subsample size parameter passed to random cut forest (third parameter). You will have to try the algorithm on your data to get a feel for the anomaly score, as the score is data-dependent.
  • 46. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 2C: Anomaly Detection Create an anomaly_stream using the SQL command found in the Kinesis Data Analytics SQL file in the Student Resources section of your lab. Append the SQL in your SQL editor. Choose Save and run SQL.
  • 47. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review: In-Application SQL streams  Your application has multiple in-application SQL streams, including DESTINATION_SQL_STREAM, AGGREGATE_STREAM, and ANOMALY_STREAM. These in-application streams are like SQL tables that are continuously updated.  What else is unique about an in-application stream aside from its continuous nature?
  • 48. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 3 Deliver Streaming Results to Amazon Redshift
  • 49. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata,create tables and transform Web logs from CSVto Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 50. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 3: Deliver Data to Amazon Redshift using Amazon Kinesis Data Firehose  Time: 5 min.  We are going to: A. Connect to Amazon Redshift cluster and create a table to hold web logs data. B. Update the Kinesis Analytics application to send data to Amazon Redshift, via the Firehose delivery stream.
  • 51. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 3A: Connect to Amazon Redshift  Or use any JDBC/ODBC/libpq client - Aginity Workbench for Amazon Redshift - SQL Workbench/J - DBeaver - DataGrip  If you use the above SQL clients, the user name and password are in your qwikLABS console at the bottom. You will find the cluster endpoint, database name (logs) in the qwikLABS console as well.  You can connect with pgweb.  Already installed and configured for the Amazon Redshift cluster.  Just navigate to pgweb, and start interacting. Note: From the qwikLABS console, open the pgWeb link in a new window.
  • 52. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 3B: Create a Table in Amazon Redshift  Create table weblogs to capture the incoming data from a Firehose delivery stream. Note: You can download Amazon Redshift SQL code from qwikLabs Student Resource section (choose Open Redshift SQL).
  • 53. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 3C: Deliver Data to Amazon Redshift Using Firehose Update Kinesis Analytics application to send data to Firehose delivery stream. Amazon Kinesis Data Firehose delivers the streaming data to Amazon Redshift. 1. Go to the Kinesis Data Analytics console. 2. Choose the Amazon Redshift delivery stream as destination, and choose the edit icon (the pencil icon in the figure below).
  • 54. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 3C: Deliver Data to Amazon Redshift Using Kinesis Data Firehose  Validate your destination 1. Choose the Kinesis Data Firehose qls-xxxxxxx- RedshiftDeliveryStream-xxxxxxxx delivery stream. 2. Keep the default for Connect in-application stream. 3. Choose CSV as the Output format. 4. Select Choose from IAM roles that Kinesis Analytics can assume. 5. Click Save and continue.  It will take about 1–2 minutes for everything to be updated and for data to start appearing in Amazon Redshift.
  • 55. Activity 3C: Deliver Data to Amazon Redshift using Kinesis Data Firehose
  • 56. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review: Amazon Redshift Test Queries  Find distribution of response codes over days (copy SQL from Amazon Redshift SQL file)  Count the number of 404 response codes
  • 57. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review: Amazon Redshift Test Queries Show all requests paths with status “PAGE NOT FOUND"
  • 58. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Extract, Transform, and Load (ETL) with AWS Glue
  • 59. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Fully managed ETL (Extract, Transform, and Load) Service AWS Glue • Categorize your data, clean it, enrich it, and reliably move it between various data stores • Once cataloged, your data is immediately searchable and queryable across your data silos • Simple, flexible, and cost-effective • Serverless; runs on a fully managed, scale-outSpark environment
  • 60. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue Components • Discover and organize your data in various databases, data warehouses, and data lakes. • Focus on the writing transformations. • Generate code through a wizard. • Runs jobs in Spark containers. Automatic scaling based on SLA. • AWS Glue is serverless. Only pay for the resourcesyou consume. Write your own code Job Authoring Data Catalog Job Execution
  • 61. AWS Glue – How It Works
  • 62. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4 Transform Web Logs to Parquet Format Using AWS Glue
  • 63. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata,create tables and transform web logs from CSVto parquet Amazon S3 Bucket Transformed web logs from Glue
  • 64. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity: Catalog and Perform ETL on Weblogs Time: 40 min. We are going to: A. Discover and catalog the web log data deposited into the S3 bucket using AWS Glue Crawler. B. Transform web logs to parquet format with theAWS Glue ETL job authoring tool.
  • 65. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Discover Dataset with AWS Glue We use the AWS Glue crawler to extract data and metadata. From the AWS Management Console, select AWS Glue Service, and on the next screen choose Get Started.
  • 66. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Add Crawler Using AWS Glue On the left, choose Crawlers, and choose Add crawler.
  • 67. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue  Specify a name for the crawler. Choose the folder icon to select the data store on Amazon S3.  Choose Next.
  • 68. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue  Provide the S3 path location where the weblogs are deposited. Navigate to the S3 path that contains the word logs, and select the raw folder.  Choose Select.
  • 69. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue Choose Next on the next screen to not add another data store.
  • 70. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue In the IAM Role section, select Create an IAM role, add default to the IAM role, and choose Next.
  • 71. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Add Crawler in AWS Glue Choose Run on demand to run the crawler now, and choose Next.
  • 72. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Add Crawler with AWS Glue On the next screen, choose Add database to add a database (use weblogdb for the database name).
  • 73. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Add Crawler with AWS Glue Review and choose Finish to create a crawler.
  • 74. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Add Crawler with AWS Glue Choose the Run it now link to run the crawler.
  • 75. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Add Crawler with AWS Glue Crawler shows a Ready status when it is finished running.
  • 76. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Table creation in AWS Glue  Observe that the crawler has created a table for your dataset.  The crawler automatically classified the dataset as combinedapache log format.  Click the table to look at the properties.
  • 77. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Table Creation in AWS Glue  AWS Glue used the GrokSerDe (Serializer/Deserializer) to correctly interpret the web logs.  You can choose the View partitions link to look at the partitions in the dataset.
  • 78. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4A: Create ETL Job in AWS Glue  With the dataset cataloged and table created, we are now ready to convert the web logs based Apache combined log format to a more optimal parquet format for querying.  Choose Add job to begin creating the ETL job.
  • 79. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue  In the Job properties window, specify the job name.  In this example, we will create a new ETL script.  AWS Glue automatically chooses the name of the script file and the path where the script will be persisted.  For the Temporary directory, specify an S3 temporary directory in your lab account (use the s3://<stack>-logs-<account>-us-west- 2/ bucket, and append the folder name, temp).  The path should look like: s3://<stack>-logs-<account>-us-west- 2/temp/
  • 80. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue  Do not choose Next yet.  Copy the temporary path to a text editor, modify the path as follows, and save it in a file (we will need this path for storing parquet files). s3://<stack>-logs-<account>-us-west-2/weblogs/processed/parquet E.g., s3: / / qls- 108881- f75177905b7b5b0e- logs- XXXXXXXX3638- us- west- 2/weblogs/processed/parquet
  • 81. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue  Expand the Script libraries and job parameters section, and increase the DPUs to 20.  Let’s pass a job parameter to send the S3 path where parquet files will be deposited. Specify the following values for Key and Value:  Key: --parquet_path (notice the 2 hyphens at the beginning)  Value: s3://<stack>-logs-<account>-us- west- 2/weblogs/processed/parquet/ Note: The value is the S3 path we stored in the previous section.
  • 82. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue Choose Next in the following screen, and choose Finish to complete the job creation.
  • 83. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETLJob in AWS Glue  Close Script Editor tips window (if it appears).  In the Glue Script Editor, copy the ETL code by Choosing the Open Glue ETL Code link in Student Resources.  Ensure that the database name (db_name) and table name reflect the database and table name created by the AWS Glue crawler.
  • 84. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue Choose Save and then Run job to execute your ETL. Choose Save and run job in the next window.
  • 85. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue  Run job to continue. This might take a few minutes. When the job finishes, weblogs will be transformed to parquet format.
  • 86. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Interactive Querying with Amazon Athena
  • 87. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Interactive Query Service  Query directly from Amazon S3  Use ANSI SQL  Serverless  Multiple data formats  Cost effective Amazon Athena
  • 88. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Familiar Technologies Under the Covers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions Used for DDLfunctionality Complex data types Multitude of formats Supports data partitioning
  • 89. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Comparing Performance and Cost Savings for Compression and Columnar Format Dataset Size onAmazon Query run time Data scanned Cost S3 Data stored as text 1 TB files 236 seconds 1.15 TB $5.75 Data stored in Apache Parquet 130 GB 6.78 seconds 2.51 GB $0.013 format* Savings / 87% less with 34x faster 99% less data 99.7% savings Speed up parquet scanned (*compressed using Snappy compression) https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
  • 90. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5 Interactive Querying with Amazon Athena
  • 91. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata,create tables and transform Web logs from CSVto Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 92. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity: Interactive Querying with Amazon Athena Time: 15 min. We are going to: A. Create a table over the processed weblogs in Amazon S3. These are the parquet files created by the AWS Glue ETLjob in the previoussection. B. Run interactive queries on the parquet weblogs.
  • 93. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena  From the AWS Management Console, search for Athena, and choose the service.
  • 94. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena  In the Analytics section, choose Amazon Athena, and on the next page, choose Get Started.
  • 95. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena  Close the running the Athena tutorial.  Close any other open tutorial windows.
  • 96. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena  We are now ready to create a table in Amazon Athena. But before we do that, we need to get the S3 bucket location where the AWS Glue job delivered parquet files. Choose Services, and in the Storage section, choose S3.
  • 97. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena Locate the bucket that looks like qls-<stackname>-logs-#####-us-west-2. Navigate to the parquet folder. Copy the name of the bucket into a text editor.
  • 98. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena  Go back to the Athena console.  Let’s create a table in Athena on the parquet dataset created by AWS Glue.  In the Athena console, in the database drop-down menu, choose weblogdb.  Type the SQL command (found by choosing the Open Athena SQL link in the Student Resources section of qwiklabs) to create a table.  Make sure to replace the <your-parquet-path> with the bucket location for the parquet files you copied in the previous step. Screen shot on next slide.
  • 99. Activity 5B: Working with Amazon Athena
  • 100. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5B: Working with Amazon Athena  The SQL ddl in the previous step creates a table in Athena based on the parquet files we created with the AWS Glue ETL job.  Select weblogdb from the database section, and click the three stacked dots icon to sample a few rows of the Amazon S3 data.
  • 101. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5C: Interactive Querying with Amazon Athena Run interactive queries (copy SQL queries from Athena SQL in Student Resources), and see the results on the console.
  • 102. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 5C: Interactive Querying with Athena  Optionally, you can save the results of a query to CSV by choosing the file icon on the Results pane.  You can also view the results of previous queries or queries that may take some time to complete. Choose History. Then, either search for your query or choose View or Download to view or download the results of previous completed queries. This also displays the status of queries that are currently running.
  • 103. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review: Amazon Athena Interactive Queries  Query results are also stored in Amazon S3 in a bucket called aws-athena- query-results-ACCOUNTID-REGION.  Where can you change the default location in the console?
  • 104. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Processing with Amazon EMR
  • 105. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon EMR Service Applications Hive, Pig, Spark SQL/Streaming/ML, Flink, Mahout, Sqoop HBase/ Phoenix Presto MXNet Batch MapReduce Interactive Tez In Memory Spark Streaming Flink YARN Cluster resource management Storage Amazon S3 (EMRFS), HDFS Ganglia - Monitoring Hue - SQL Zeppelin - Notebooks Livy - Job server Connectorsto AWSservices Amazon EMR service
  • 106. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. On-Cluster UIs Manage applicationsNotebooks SQL editor, workflow designer, metastore browser Design and execute queries and workloads And more using bootstrap actions!
  • 107. The Hadoop Ecosystem Can Run in Amazon EMR
  • 108. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Deep Learning with GPU Instances Use P3 instances with GPUs
  • 109. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Easy to Use Spot Instances Meet SLA at a predictable cost On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Exceed SLA at a lowercost Spot Instances for task nodes Up to 90% off Amazon EC2 on-demand pricing
  • 110. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon S3 as Your Persistent Data Store  Separate compute and storage  Resize and shut down Amazon EMR clusters with no data loss  Point multiple Amazon EMR clusters at the same data in Amazon S3 EMR EMR Amazon S3
  • 111. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. EMRFS Makes It Easier to Use Amazon S3  Better performance and error-handling options  Transparent to applications – Use “s3://”  Consistent view - For consistent list and read-after-write for newputs  Support for Amazon S3 server-side and client-side encryption  Faster listing using EMRFS metadata
  • 112. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Apache Spark  Fast, general-purpose engine for large- scale data processing  Write applications quickly in Java, Scala, or Python  Combine SQL, streaming, and complex analytics
  • 113. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Apache Zeppelin  Web-based notebook for interactive analytics  Multiple language backend  Apache Spark integration  Data visualization  Collaboration  https://zeppelin.apache.org/
  • 114. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6 Interactive Analysis Using Amazon EMR
  • 115. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Amazon Kinesis Data Firehose AmazonKinesis Data Analytics AmazonKinesis Data Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata,create tables and transform Web logs from CSVto Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 116. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6: Process and Query Data with Amazon EMR  Time: 20 minutes  We are going to: A. Use a Zeppelin notebook to interact with an Amazon EMR cluster, B. Process the data in Amazon S3 using Apache Spark. C. Query the data processed in the earlier stage, and create simple charts.
  • 117. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6A: Open the Zeppelin Interface  Copy the Zeppelin endpoint in Student Resources section in qwiklabs.  In the Student Resources section, select the Open Zeppelin Notebook link to open the Zeppelin link in a new window.  Download the file (or copy and save it to file with .json extension).  Import the notebook using the Import note link on the Zeppelin interface. Note: Disconnect from VPN, or the page will not load.
  • 118. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6A: Open the Zeppelin Interface  Use the s3://<stack>-logs-<account>-us-west- 2/processed/parquet bucket where the processed parquet files weredeposited.  Run the paragraph.
  • 119. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6B: Run the Notebook  Enter the S3 bucket name where the parquet files are stored. The bucket name begins with <stack>-*-logs-#####-region.  Execute step 1 - Enter the bucket name (<stack>-*-logs-##########-us-west-2).  Execute step 2 - Create a data frame with the parquet files from the Glue ETL job.  Execute step 3 - Sample a few rows.
  • 120. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6B: Run the Notebook  Execute step 4 to process the data • Notice how the AGENT field consists of the BROWSER at the beginning of the column value. Let’s extract the browser from that. • Create a UDF that will extract the browser part and add it to the dataframe. • Print the new data frame.
  • 121. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 6B: Run the Notebook  Execute step 6 - Register the data frame as a temporary table. - Now you can run SQL queries on the temporary tables.  Execute the next three steps, and observe the charts created. - What did you learn about the dataset?
  • 122. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review : Interactive Analysis Using Amazon EMR  You just learned how to process and query data using Amazon EMR with Apache Spark.  Amazon EMR has many other frameworks available for you to use: - Hive, Presto, Flink, Pig, MapReduce - Hue, Oozie, HBase
  • 123. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Optional Exercise: Data Visualization with Amazon QuickSight
  • 124. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon QuickSight  Fast, easy interactive analytics for anyone, everywhere  Ease of use targeted at business users  Blazing fast performance powered by SPICE  Broad connectivity with AWS data services, on-premises data, files, and business applications  Cloud-native solution that scales automatically  1/10th the cost of traditional BI solutions  Create, share, and collaborate with anyone in your organization, on the web or on mobile
  • 125. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Connect, SPICE, Analyze Amazon QuickSight enables you to connect to data from a wide variety of AWS, third-party, and on-premises sources and import it to SPICE or query directly. Users can then easily explore, analyze, and share their insights with anyone. Amazon RDS Amazon S3 Amazon Redshift
  • 126. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7 Visualize Results in Amazon QuickSight
  • 127. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Your Application Architecture Kinesis Producer UI Kinesis Data Firehose KinesisData Analytics KinesisData Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Kinesis Data Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWSGlue Extract metadata,create tables and transform Web logs from CSVto Parquet Amazon S3 Bucket Transformed web logs from AWSGlue
  • 128. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7: Visualization with Amazon QuickSight We are going to: A. Register for an Amazon QuickSight account. B. Connect to the Amazon Redshift cluster. C. Create visualizations for analysis to answer questions like: A. What are the most common HTTP requests, and how successful (response code of 200) are they? B. Which are the most requested URIs?
  • 129. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7A: Amazon QuickSight Registration  Go to the AWS Management Console, and in the Analytics section, choose QuickSight.  On the next screen, choose Signup.  Make sure the subscription type is Standard, and on the next screen, choose Continue.
  • 130. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7A: Amazon QuickSight Registration  On the Subscription type page, type in the account name. The QuickSight account name is the AWS account number on the qwikLabs console.  Type in your email address.  Select US West region.  Select S3 (all buckets).
  • 131. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7A: QuickSight Registration  If a pop box to choose S3 buckets appears, choose Select buckets.  Choose Go to Amazon QuickSight.  Dismiss the welcome screen.
  • 132. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7B: Connect to the Data Source  Choose Manage data and then choose New dataset to create a new dataset in Amazon QuickSight.  Choose Redshift Auto- discovered as the data source. Amazon QuickSight autodiscovers databases associated with your AWS account, in this case the Amazon Redshift database.
  • 133. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7B: Connect to Amazon Redshift Note: Use dbadmin as the user name. You can get the Amazon Redshift database password from qwikLABS by navigating to the Connection details section (see below).
  • 134. Activity 7C: Choose Your Weblogs Amazon Redshift Table
  • 135. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7D: Ingest Data into SPICE  SPICE is Amazon QuickSight's in- memory optimized calculation engine, designed specifically for fast, interactive data visualization.  You can improve the performance of database datasets by importing the data into SPICE instead of using a direct query to the database.
  • 136. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Activity 7E: Creating Your First Analysis  What are the most requested HTTP request types and their corresponding response codes for this site?  Simply select request, response, and let AUTOGRAPH create the optimal visualization
  • 137. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Review – Creating Your Analysis  Exercise: Add a visual to demonstrate which URI is the most requested.
  • 138. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the summit mobile app.
  • 139. Submit Session Feedback 1. Tap the Schedule icon. 2. Select the session you attended. 3. Tap Session Evaluation to submit your feedback.
  • 140. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Thank you!