Redshift Introduction

Amazon Redshift
Saturday, December 6, 2014

Agenda
08:30 AM Breakfast
09:00 AM Introduction and Strengths of Technologies
10:00 AM break + set up query tool
10:20 AM Hadoop hands-on
10:55 AM break
11:10 AM Redshift hands-on
11:40 AM Operationalizing your code
12:00 PM adjourn
12/6/2014 2

Session Goals
• Understand:
• Why an Analytic Database?
• What is Amazon Redshift
• Do:
• ‘Fire Up’ an Redshift Database
• Load Data
• Do a few queries
• Shut it down
• Have fun!
12/6/2014 3

Why an Analytic Database?
Why use one?
• It a database optimized for read-only queries.
• It’s fast
• It can handle a lot of data
Why not to use one?
• Poor Transaction processing (aka OLTP)
• Rollback, multi-phase commits, etc
12/6/2014 4

Under the hood.
Analytic Database typically have features like:
• Compression
• Column (as opposed to row) storage
• Parallel queries across clusters of machines
• Support for partitioning
• Other cool stuff to make your queries fast
12/6/2014 5

Columns vs Row Storage
12/6/2014 6

Amazon Redshift is an Example of
an Analytic Database
12/6/2014 9

Amazon Redshift uses typical SQL
to query the database
12/6/2014 10

Let’s Get Stared!
The basics:
• You will need an AWS account
• AWS Secret Key
• AWS Access Key
• Install SQL Workbench
• http://www.sql-workbench.net/manual/install.html
• Install Postres JDBC Drivers:
• http://jdbc.postgresql.org/
12/6/2014 11

Let’s Get Stared!: https://aws.amazon.com/
12/6/2014 12
Click Here

Redshift: https://console.aws.amazon.com/redshift/.
Click Here
12/6/2014 13

Launch: http://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-launch-sample-cluster.html
12/6/2014 14
Fill these out

Single Node: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:
12/6/2014 15
Single Node

Security: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:
12/6/2014 16
East, not in VPC, default,
no alarms (below)

Review: https://console.aws.amazon.com/redshift/home?region=us-east-1#launch-cluster:
12/6/2014 17
Review

Wait:
12/6/2014 20
Wait, then click

When Active:
12/6/2014 21
You’ll need these details

Connect with SQL Workbench:
12/6/2014 22
Select Connect Window

12/6/2014 23
Fill this out

Get the JDBC URL
12/6/2014 24
Copy this

12/6/2014 25
Paste and Fill this out

New SQL Tab
12/6/2014 27
Add Tab

New SQL Tab
12/6/2014 28
Add Tab

Make Tables
12/6/2014 29
Create Some Tables
CREATE TABLE rankings
(
pageURL VARCHAR(300),
pageRank INT,
avgDuration INT
);
CREATE TABLE uservisits
(
sourceIP VARCHAR(116),
destinationURL VARCHAR(100),
visitDate DATE,
adRevenue FLOAT,
UserAgent VARCHAR(256),
cCode CHAR(3),
lCode CHAR(6),
searchWord VARCHAR(32),
duration INT
);

Load Data
copy uservisits FROM 's3://big-data-benchmark/pavlo/text/tiny/uservisits/' CREDENTIALS
'aws_access_key_id=<your key>;aws_secret_access_key=<your key>' delimiter ',';
12/6/2014 30
Load Data from S3
copy rankings FROM 's3://big-data-benchmark/pavlo/text/tiny/rankings/' CREDENTIALS
'aws_access_key_id =<your key>;aws_secret_access_key =<your key>' delimiter ',';

Load Bigger Data
12/6/2014 31
Load Data from S3
's3://big-data-benchmark/pavlo/text/tiny/uservisits/‘
-- options: "tiny", "1node", "5nodes", "10nodes"

Simple Queries
12/6/2014 32
Query
select * from uservisits limit 100;
SELECT COUNT(*) from uservisits;
select * from rankings limit 100;
SELECT COUNT(*) from rankings;

Complex Queries
12/6/2014 33
Query
SELECT pageURL, pageRank FROM rankings WHERE pageRank > 10;
SELECT sourceIP, SPLIT_PART(sourceIP, '.', 1) as fn, SPLIT_PART(sourceIP, '.', 2) as sn FROM
uservisits LIMIT 100;
SELECT sourceIP,
SUM(adRevenue) AS totalRevenue,
AVG(pageRank) AS pageRank
FROM rankings R
JOIN (SELECT sourceIP,
destinationURL,
adRevenue
FROM uservisits uv) NUV ON (R.pageURL = NUV.destinationURL)
GROUP BY sourceIP
ORDER BY totalRevenue DESC LIMIT 100;

Shut it down!
12/6/2014 34
Click

Shut it down!
Click
12/6/2014 35

Shut it down!
12/6/2014 36
No snapshot

Thanks … happy querying!
See also
• http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html
12/6/2014 38

Redshift Introduction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Redshift Introduction

Similar to Redshift Introduction (20)

More from DataKitchen

More from DataKitchen (10)

Recently uploaded

Recently uploaded (20)

Redshift Introduction

Editor's Notes