SlideShare a Scribd company logo
1
www.matillion.com
© 2017 Matillion. All rights reserved.
Presented by:
Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or
service marks are property of their respective owners. 3/20/2017
Matillion Webinar
Tuesday, August 29, 2017
Getting Started with Amazon Redshift
Harpreet Singh & Nick Tierney
2
www.matillion.com
© 2017 Matillion. All rights reserved.
• Data integration tool built specifically for
AWS and Amazon Redshift
• Push-down ELT architecture
• Intuitive browser-based UX
• Powerful feature set
• Retail like acquisition through AWS
Marketplace
Matillion ETL for Amazon Redshift
3
www.matillion.com
© 2017 Matillion. All rights reserved.
ELT
Architecture
KEY BENEFITS
• Simplified infrastructure
• Fast performance, scalable
• Increases development productivity
Intuitive UX
KEY BENEFITS
• Low skills on-ramp
• Powerful
• Reduces cost of data integration development
Built for AWS
KEY BENEFITS
• Perfect integration to AWS services
• Designed for the Cloud
• No compromise
Retail-like
purchasing
KEY BENEFITS
• 5 minutes to stand-up
• Cheap, utility-based pricing
• Available exclusively on AWS Marketplace
Benefits Matillion ETL for Amazon Redshift
4
www.matillion.com
© 2017 Matillion. All rights reserved.
Agenda
● An overview of petabyte scale data warehouses, the architecture, and use cases.
● An introduction to Amazon Redshift parallel processing, columnar, and scaled out
architecture.
● Learn how to configure your data warehouse cluster, optimize your scheme, and
quickly load your data.
● An overview of all the latest features of Amazon Redshift.
● How Matillion ETL works with Redshift and can help you load massive amounts of
data in minutes.
5
www.matillion.com
© 2017 Matillion. All rights reserved.
Overview
Amazon Redshift is Massively Parallel Relational data warehouse based on industry
standard PostgreSQL, so most existing SQL client applications will work with only minimal
changes.
● Petabyte scale;
● Fully managed;
● Zero Admin
● SSD & HDD platforms
● As low as $1,000/TB/Year
6
www.matillion.com
© 2017 Matillion. All rights reserved.
Amazon Redshift cost comparison
DW1 (HDD)
Price Per Hour for Effective Annual
DW1.XL Single Node Price per TB
Price Per Hour for Effective Annual
DW2.L Single Node Price per TB
7
www.matillion.com
© 2017 Matillion. All rights reserved.
Amazon Redshift is priced by the amount of data you store and by the number of nodes. The number of nodes is expandable. Depending upon the current
volume of data you need to manage, your data team can setup anywhere from a single node -- which is a 160 GB or 0.016 TB of solid state disk space -- to
a 128 node cluster with a capacity for 16 TB on a hard disk drive.
There are a couple of other caveats to keep in mind. But, before we present those, it’s important for you to understand that Amazon separates its nodal
definitions into two meta-types: Dense Compute and Dense Storage. Each of the sub-types within the classifications stores a maximum amount of
compressed data.
Dense Compute: Recommended for less than 500GB of data
The smallest in the Dense Compute class is the dc1.large with 2 virtual cores, and .16 TB of SDD storage capacity. You can increase dc1.large from one
node to a cluster of 32 nodes, which expands the SSD capacity to 5.12 TB.
Meanwhile, the dc1.8xlarge runs 32 virtual cores and is scalable from a cluster of 2 to 128 nodes which allow a maximum of 326 TB of SSD storage space.
Dense Storage: Recommended for cost effective scalability for over 500GB of data
For data management using hard disk drive space and a larger number of virtual cores, Redshift has two options. The ds2.xlarge can be initiated with only a
single node that has up to a 2TB capacity. However, the single node can be increased to a maximum cluster of 32 nodes.
Need a larger cluster instead? You can switch to the ds2.8xlarge option with 36 virtual cores starting with 2 nodes and expandable to a 128 node cluster with
a maximum of 16 TB of magnetic HDD space.
Redshift has no upfront costs (there is an exception which is described below). However, the price per hour is largely dependent on your region. For
example, let’s say your region option is US West (Northern California), and you're running a small startup with a single node. Redshift’s current pricing
structure states that you’ll pay $0.33 per hour for that 0.16TB SSD. On the other hand, running a single node in the US East (North Virginia) region costs you
$0.25 per hour.
These are all categorized as Redshift’s “On Demand” pricing. If you want the “Reserved Instance pricing” then a longer term commitment of either one or
three years available.
A few things to keep in mind regarding infrastructure and pricing for Redshift:
•You choose either the Dense Computer or Dense Storage nodes. They cannot be mixed and matched (at least not currently).
•If you decide that Reserved Instance pricing is the way to go, there are no upfront costs for the one year term commitment, but you can choose to pay
upfront in partial or in full. With a three year contract, you also have a choice between a full or partial upfront payments.
8
www.matillion.com
© 2017 Matillion. All rights reserved.
Redshift Architecture Client applications
Amazon Redshift integrates with various data loading and ETL (extract, transform, and load) tools and business intelligence (BI) reporting, data
mining, and analytics tools.
Clusters
The core infrastructure component of an Amazon Redshift data warehouse is a cluster.
A cluster is composed of one or more compute nodes. If a cluster is provisioned with two or more compute nodes, an additional leader node
coordinates the compute nodes and handles external communication. Your client application interacts directly only with the leader node. The
compute nodes are transparent to external applications.
Leader node
The leader node manages communications with client programs and all communication with compute nodes. It parses and develops execution
plans to carry out database operations, in particular, the series of steps necessary to obtain results for complex queries. Based on the execution
plan, the leader node compiles code, distributes the compiled code to the compute nodes, and assigns a portion of the data to each compute
node.
Compute nodes
The leader node compiles code for individual elements of the execution plan and assigns the code to individual compute nodes. The compute
nodes execute the compiled code and send intermediate results back to the leader node for final aggregation.
Node slices
A compute node is partitioned into slices. Each slice is allocated a portion of the node's memory and disk space, where it processes a portion of
the workload assigned to the node. The leader node manages distributing data to the slices and apportions the workload for any queries or other
database operations to the slices. The slices then work in parallel to complete the operation.
The number of slices per node is determined by the node size of the cluster.
When you create a table, you can optionally specify one column as the distribution key. When the table is loaded with data, the rows are
distributed to the node slices according to the distribution key that is defined for a table. Choosing a good distribution key enables Amazon
Redshift to use parallel processing to load data and execute queries efficiently.
Databases
A cluster contains one or more databases. User data is stored on the compute nodes. Your SQL client communicates with the leader node, which
in turn coordinates query execution with the compute nodes.
9
www.matillion.com
© 2017 Matillion. All rights reserved.
Use Cases
Traditional Enterprise DW
● Reduce costs by extending
DW rather than adding HW
● Migrate completely from
existing DW systems.
● Respond faster to business
Companies with Big Data
● Improve performance by
order of magnitude
● Make more data available for
analysis
● Access business data via
standard reporting tools
SaaS Companies
● Add analytics functionality to
applications
● Scale DW capacity as demand
grows
● Reduce HW & SW costs by an
order of magnitude
10
www.matillion.com
© 2017 Matillion. All rights reserved.
Petabyte Scale
With a few clicks in console or a simple API call, you can
easily change the number or type of nodes in your data
warehouse and scale up all the way to a petabyte or more of
compressed user data. Dense Storage (DS) nodes allow you
to create very large data warehouses using hard disk drives
(HDDs) for a very low price point. Dense Compute (DC) nodes
allow you to create very high performance data warehouses
using fast CPUs, large amounts of RAM and solid-state disks
(SSDs). While resizing, Amazon Redshift allows you to
continue to query your data warehouse in read-only mode
until the new cluster is fully provisioned and ready for use.
Redshift features
Optimized for Data Warehousing
Amazon Redshift uses a variety of innovations to obtain very
high query performance on datasets ranging in size from a
hundred gigabytes to an exabyte or more. For petabyte-scale
local data, it uses columnar storage, data compression, and zone
maps to reduce the amount of I/O needed to perform queries.
Amazon Redshift has a massively parallel processing (MPP) data
warehouse architecture, parallelizing and distributing SQL
operations to take advantage of all available resources. The
underlying hardware is designed for high performance data
processing, using local attached storage to maximize throughput
between the CPUs and drives, and a 10GigE mesh network to
maximize throughput between nodes. For exabyte-scale data in
Amazon S3, Amazon Redshift generates an optimal query plan
that minimizes the amount of data scanned and delegates the
query execution to a pool of Redshift Spectrum instances that
scales automatically, so queries run quickly regardless of data
size.
Query your Amazon S3 “data lake”
Redshift Spectrum enables you to run queries against
exabytes of unstructured data in Amazon S3, with no loading
or ETL required. When you issue a query, it goes to the
Amazon Redshift SQL endpoint, which generates and
optimizes a query plan. Amazon Redshift determines what
data is local and what is in Amazon S3, generates a plan to
minimize the amount of Amazon S3 data that needs to be
read, requests Amazon Redshift Spectrum workers out of a
shared resource pool to read and process data from Amazon
S3, and pulls results back into your Amazon Redshift cluster
for any remaining processing.
No Up-Front Costs
You pay only for the resources you provision. You can choose On-
Demand pricing with no up-front costs or long-term commitments,
or obtain significantly discounted rates with Reserved Instance
pricing. On-Demand pricing starts at just $0.25/hour per 160GB
DC1.Large node or $0.85/hour per 2TB DS2.XLarge node. With
Partial Upfront Reserved Instances, you can lower your effective
price to $0.10/hour per DC1.Large node ($5,500/TB/year) or
$0.228/hour per DS2.XLarge node ($999/TB/year). Redshift
Spectrum queries are priced at $5/TB scanned from S3. For more
information, see the Amazon Redshift Pricing page.
Fault Tolerant
Amazon Redshift has multiple features that enhance the reliability
of your data warehouse cluster. All data written to a node in your
cluster is automatically replicated to other nodes within the cluster
and all data is continuously backed up to Amazon S3. Amazon
Redshift continuously monitors the health of the cluster and
automatically re-replicates data from failed drives and replaces
nodes as necessary.
11
www.matillion.com
© 2017 Matillion. All rights reserved.
Redshift features
Automated Backups
Amazon Redshift automatically and continuously backs up new data
to Amazon S3. It stores your snapshots for a user-defined period
from 1 up to 35 days. You can take your own snapshots at any time,
and they are retained until you explicitly delete them. Amazon
Redshift can also asynchronously replicate your snapshots to S3 in
another region for disaster recovery. Once you delete a cluster,
your system snapshots are removed, but your user snapshots are
available until you explicitly delete them.
Fast Restores
You can use any system or user snapshot to restore your cluster
using the AWS Management Console or the Amazon Redshift APIs.
Your cluster is available as soon as the system metadata has been
restored and you can start running queries while user data is
spooled down in the background.
Encryption
With just a couple of parameter settings, you can set up Amazon
Redshift to use SSL to secure data in transit and hardware-
accelerated AES-256 encryption for data at rest. If you choose to
enable encryption of data at rest, all data written to disk will be
encrypted as well as any backups. By default, Amazon Redshift
takes care of key management but you can choose to manage your
keys using your own hardware security modules (HSMs), AWS
CloudHSM, or AWS Key Management Service.
Network Isolation
Amazon Redshift enables you to configure firewall rules to control
network access to your data warehouse cluster. You can run
Amazon Redshift inside Amazon VPC to isolate your data
warehouse cluster in your own virtual network and connect it to
your existing IT infrastructure using industry-standard encrypted
IPsec VPN.
Audit and Compliance
Amazon Redshift integrates with AWS CloudTrail to enable you to
audit all Redshift API calls. Amazon Redshift also logs all SQL
operations, including connection attempts, queries and changes to
your database. You can access these logs using SQL queries against
system tables or choose to have them downloaded to a secure
location on Amazon S3. Amazon Redshift is compliant with SOC1,
SOC2, SOC3 and PCI DSS Level 1 requirements. For more details,
please visit AWS Cloud Compliance.
12
www.matillion.com
© 2017 Matillion. All rights reserved.
Setting up Redshift Cluster
● Create IAM Role
● Launch a cluster
● Authorize cluster access
● Connect to cluster
● Loading data.(Matillion)
13
www.matillion.com
© 2017 Matillion. All rights reserved.
Launch a Redshift Cluster
Go into the Amazon web services console and select "Redshift" from the list of services.
14
www.matillion.com
© 2017 Matillion. All rights reserved.
Enter suitable values for the cluster identifier, database name (e.g. 'snowplow'), port, username and password. Click the "Continue"
button.
15
www.matillion.com
© 2017 Matillion. All rights reserved.
We now need to configure the cluster size. Select the values that are most appropriate to your situation. We generally recommend starting with a single node cluster
with node type i.e. a dw1.xlarge or dw2.large node, and then adding nodes as your data volumes grow.
You now have the opportunity to encrypt the database and set the availability zone if you wish. Select your preferences and click "Continue".
16
www.matillion.com
© 2017 Matillion. All rights reserved.
Amazon summarises your cluster information. Click "Launch Cluster" to fire your Redshift instance up. This will take a few minutes to complete.
Alternatively, you could use AWS CLI to launch a new cluster. The outcome of the above steps could be achieved with the following command.
$ aws redshift create-cluster 
--node-type dc1.large 
--cluster-type single-node 
--cluster-identifier snowplow
--db-name pbz 
--master-username admin 
--master-user-password
TopSecret1
17
www.matillion.com
© 2017 Matillion. All rights reserved.
Matillion as ETL Tool
External Data
On Premises
Databases
Matillion ETL
Server
Data Transfer
Redshift
V
P
N RDS
DynamoDB
VPC
Browser
Design
18
www.matillion.com
© 2017 Matillion. All rights reserved.
Matillion Data Sources
19
www.matillion.com
© 2017 Matillion. All rights reserved.
ELT - Extract, Load then Transform
MATILLION EXTRACT, LOAD &
TRANSFORM APPROACH
TRADITIONAL EXTRACT, LOAD &
TRANSFORM APPROACH
20
www.matillion.com
© 2017 Matillion. All rights reserved.
21
www.matillion.com
© 2017 Matillion. All rights reserved.
Presented by:
Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or
service marks are property of their respective owners. 3/20/2017
Thank You
Harpreet Singh & Nick Tierney

More Related Content

What's hot

Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
sadak pramodh
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
Hue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL EditorHue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL Editor
Romain Rigaux
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Amazon Web Services
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 
Cloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case StudyCloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case Study
CAST
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Samy Dindane
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
Russell Jurney
 
System Design Interviews.pdf
System Design Interviews.pdfSystem Design Interviews.pdf
System Design Interviews.pdf
RaviTandon11
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
Amazon Web Services
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
Jason Hubbard
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
Amazon Web Services
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Edureka!
 

What's hot (20)

Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Hue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL EditorHue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL Editor
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Cloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case StudyCloud Migration: Cloud Readiness Assessment Case Study
Cloud Migration: Cloud Readiness Assessment Case Study
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
System Design Interviews.pdf
System Design Interviews.pdfSystem Design Interviews.pdf
System Design Interviews.pdf
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 

Similar to Getting Started With Amazon Redshift

Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Amazon Web Services LATAM
 
What is Amazon Redshift?
What is Amazon Redshift?What is Amazon Redshift?
What is Amazon Redshift?
jeetendra mandal
 
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdfAmazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Hoang CHi THang
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Amazon Web Services
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
Shreyansh Ajit kumar
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Amazon Web Services Japan
 
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro SessionAWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
SaM theCloudGuy
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
huguk
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
Amazon Web Services
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
Amazon Web Services Korea
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
Amazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Amazon Web Services
 
IBM Dash DB
IBM Dash DBIBM Dash DB
IBM Dash DB
Luke Farrell
 

Similar to Getting Started With Amazon Redshift (20)

Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
What is Amazon Redshift?
What is Amazon Redshift?What is Amazon Redshift?
What is Amazon Redshift?
 
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdfAmazon-Redshift-dBT-Best-Practices_paper.pdf
Amazon-Redshift-dBT-Best-Practices_paper.pdf
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
 
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro SessionAWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
IBM Dash DB
IBM Dash DBIBM Dash DB
IBM Dash DB
 

More from Matillion

Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
Matillion
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
ELT is Better. Here's Why.
ELT is Better. Here's Why. ELT is Better. Here's Why.
ELT is Better. Here's Why.
Matillion
 
Pick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehousePick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data Warehouse
Matillion
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data Lakes
Matillion
 
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesUsing ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Matillion
 
Reach New Heights with Amazon Redshift
Reach New Heights with Amazon RedshiftReach New Heights with Amazon Redshift
Reach New Heights with Amazon Redshift
Matillion
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
Matillion
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
Matillion
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
Matillion
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
Matillion
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftKickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Matillion
 
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Matillion
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Matillion
 
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumWebinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Matillion
 
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift SpectrumWebinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
 

More from Matillion (16)

Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
ELT is Better. Here's Why.
ELT is Better. Here's Why. ELT is Better. Here's Why.
ELT is Better. Here's Why.
 
Pick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehousePick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data Warehouse
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data Lakes
 
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesUsing ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 Minutes
 
Reach New Heights with Amazon Redshift
Reach New Heights with Amazon RedshiftReach New Heights with Amazon Redshift
Reach New Heights with Amazon Redshift
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftKickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
 
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
 
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumWebinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
 
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift SpectrumWebinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
 

Recently uploaded

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 

Recently uploaded (20)

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 

Getting Started With Amazon Redshift

  • 1. 1 www.matillion.com © 2017 Matillion. All rights reserved. Presented by: Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or service marks are property of their respective owners. 3/20/2017 Matillion Webinar Tuesday, August 29, 2017 Getting Started with Amazon Redshift Harpreet Singh & Nick Tierney
  • 2. 2 www.matillion.com © 2017 Matillion. All rights reserved. • Data integration tool built specifically for AWS and Amazon Redshift • Push-down ELT architecture • Intuitive browser-based UX • Powerful feature set • Retail like acquisition through AWS Marketplace Matillion ETL for Amazon Redshift
  • 3. 3 www.matillion.com © 2017 Matillion. All rights reserved. ELT Architecture KEY BENEFITS • Simplified infrastructure • Fast performance, scalable • Increases development productivity Intuitive UX KEY BENEFITS • Low skills on-ramp • Powerful • Reduces cost of data integration development Built for AWS KEY BENEFITS • Perfect integration to AWS services • Designed for the Cloud • No compromise Retail-like purchasing KEY BENEFITS • 5 minutes to stand-up • Cheap, utility-based pricing • Available exclusively on AWS Marketplace Benefits Matillion ETL for Amazon Redshift
  • 4. 4 www.matillion.com © 2017 Matillion. All rights reserved. Agenda ● An overview of petabyte scale data warehouses, the architecture, and use cases. ● An introduction to Amazon Redshift parallel processing, columnar, and scaled out architecture. ● Learn how to configure your data warehouse cluster, optimize your scheme, and quickly load your data. ● An overview of all the latest features of Amazon Redshift. ● How Matillion ETL works with Redshift and can help you load massive amounts of data in minutes.
  • 5. 5 www.matillion.com © 2017 Matillion. All rights reserved. Overview Amazon Redshift is Massively Parallel Relational data warehouse based on industry standard PostgreSQL, so most existing SQL client applications will work with only minimal changes. ● Petabyte scale; ● Fully managed; ● Zero Admin ● SSD & HDD platforms ● As low as $1,000/TB/Year
  • 6. 6 www.matillion.com © 2017 Matillion. All rights reserved. Amazon Redshift cost comparison DW1 (HDD) Price Per Hour for Effective Annual DW1.XL Single Node Price per TB Price Per Hour for Effective Annual DW2.L Single Node Price per TB
  • 7. 7 www.matillion.com © 2017 Matillion. All rights reserved. Amazon Redshift is priced by the amount of data you store and by the number of nodes. The number of nodes is expandable. Depending upon the current volume of data you need to manage, your data team can setup anywhere from a single node -- which is a 160 GB or 0.016 TB of solid state disk space -- to a 128 node cluster with a capacity for 16 TB on a hard disk drive. There are a couple of other caveats to keep in mind. But, before we present those, it’s important for you to understand that Amazon separates its nodal definitions into two meta-types: Dense Compute and Dense Storage. Each of the sub-types within the classifications stores a maximum amount of compressed data. Dense Compute: Recommended for less than 500GB of data The smallest in the Dense Compute class is the dc1.large with 2 virtual cores, and .16 TB of SDD storage capacity. You can increase dc1.large from one node to a cluster of 32 nodes, which expands the SSD capacity to 5.12 TB. Meanwhile, the dc1.8xlarge runs 32 virtual cores and is scalable from a cluster of 2 to 128 nodes which allow a maximum of 326 TB of SSD storage space. Dense Storage: Recommended for cost effective scalability for over 500GB of data For data management using hard disk drive space and a larger number of virtual cores, Redshift has two options. The ds2.xlarge can be initiated with only a single node that has up to a 2TB capacity. However, the single node can be increased to a maximum cluster of 32 nodes. Need a larger cluster instead? You can switch to the ds2.8xlarge option with 36 virtual cores starting with 2 nodes and expandable to a 128 node cluster with a maximum of 16 TB of magnetic HDD space. Redshift has no upfront costs (there is an exception which is described below). However, the price per hour is largely dependent on your region. For example, let’s say your region option is US West (Northern California), and you're running a small startup with a single node. Redshift’s current pricing structure states that you’ll pay $0.33 per hour for that 0.16TB SSD. On the other hand, running a single node in the US East (North Virginia) region costs you $0.25 per hour. These are all categorized as Redshift’s “On Demand” pricing. If you want the “Reserved Instance pricing” then a longer term commitment of either one or three years available. A few things to keep in mind regarding infrastructure and pricing for Redshift: •You choose either the Dense Computer or Dense Storage nodes. They cannot be mixed and matched (at least not currently). •If you decide that Reserved Instance pricing is the way to go, there are no upfront costs for the one year term commitment, but you can choose to pay upfront in partial or in full. With a three year contract, you also have a choice between a full or partial upfront payments.
  • 8. 8 www.matillion.com © 2017 Matillion. All rights reserved. Redshift Architecture Client applications Amazon Redshift integrates with various data loading and ETL (extract, transform, and load) tools and business intelligence (BI) reporting, data mining, and analytics tools. Clusters The core infrastructure component of an Amazon Redshift data warehouse is a cluster. A cluster is composed of one or more compute nodes. If a cluster is provisioned with two or more compute nodes, an additional leader node coordinates the compute nodes and handles external communication. Your client application interacts directly only with the leader node. The compute nodes are transparent to external applications. Leader node The leader node manages communications with client programs and all communication with compute nodes. It parses and develops execution plans to carry out database operations, in particular, the series of steps necessary to obtain results for complex queries. Based on the execution plan, the leader node compiles code, distributes the compiled code to the compute nodes, and assigns a portion of the data to each compute node. Compute nodes The leader node compiles code for individual elements of the execution plan and assigns the code to individual compute nodes. The compute nodes execute the compiled code and send intermediate results back to the leader node for final aggregation. Node slices A compute node is partitioned into slices. Each slice is allocated a portion of the node's memory and disk space, where it processes a portion of the workload assigned to the node. The leader node manages distributing data to the slices and apportions the workload for any queries or other database operations to the slices. The slices then work in parallel to complete the operation. The number of slices per node is determined by the node size of the cluster. When you create a table, you can optionally specify one column as the distribution key. When the table is loaded with data, the rows are distributed to the node slices according to the distribution key that is defined for a table. Choosing a good distribution key enables Amazon Redshift to use parallel processing to load data and execute queries efficiently. Databases A cluster contains one or more databases. User data is stored on the compute nodes. Your SQL client communicates with the leader node, which in turn coordinates query execution with the compute nodes.
  • 9. 9 www.matillion.com © 2017 Matillion. All rights reserved. Use Cases Traditional Enterprise DW ● Reduce costs by extending DW rather than adding HW ● Migrate completely from existing DW systems. ● Respond faster to business Companies with Big Data ● Improve performance by order of magnitude ● Make more data available for analysis ● Access business data via standard reporting tools SaaS Companies ● Add analytics functionality to applications ● Scale DW capacity as demand grows ● Reduce HW & SW costs by an order of magnitude
  • 10. 10 www.matillion.com © 2017 Matillion. All rights reserved. Petabyte Scale With a few clicks in console or a simple API call, you can easily change the number or type of nodes in your data warehouse and scale up all the way to a petabyte or more of compressed user data. Dense Storage (DS) nodes allow you to create very large data warehouses using hard disk drives (HDDs) for a very low price point. Dense Compute (DC) nodes allow you to create very high performance data warehouses using fast CPUs, large amounts of RAM and solid-state disks (SSDs). While resizing, Amazon Redshift allows you to continue to query your data warehouse in read-only mode until the new cluster is fully provisioned and ready for use. Redshift features Optimized for Data Warehousing Amazon Redshift uses a variety of innovations to obtain very high query performance on datasets ranging in size from a hundred gigabytes to an exabyte or more. For petabyte-scale local data, it uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. Amazon Redshift has a massively parallel processing (MPP) data warehouse architecture, parallelizing and distributing SQL operations to take advantage of all available resources. The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a 10GigE mesh network to maximize throughput between nodes. For exabyte-scale data in Amazon S3, Amazon Redshift generates an optimal query plan that minimizes the amount of data scanned and delegates the query execution to a pool of Redshift Spectrum instances that scales automatically, so queries run quickly regardless of data size. Query your Amazon S3 “data lake” Redshift Spectrum enables you to run queries against exabytes of unstructured data in Amazon S3, with no loading or ETL required. When you issue a query, it goes to the Amazon Redshift SQL endpoint, which generates and optimizes a query plan. Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Amazon Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Amazon Redshift cluster for any remaining processing. No Up-Front Costs You pay only for the resources you provision. You can choose On- Demand pricing with no up-front costs or long-term commitments, or obtain significantly discounted rates with Reserved Instance pricing. On-Demand pricing starts at just $0.25/hour per 160GB DC1.Large node or $0.85/hour per 2TB DS2.XLarge node. With Partial Upfront Reserved Instances, you can lower your effective price to $0.10/hour per DC1.Large node ($5,500/TB/year) or $0.228/hour per DS2.XLarge node ($999/TB/year). Redshift Spectrum queries are priced at $5/TB scanned from S3. For more information, see the Amazon Redshift Pricing page. Fault Tolerant Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. All data written to a node in your cluster is automatically replicated to other nodes within the cluster and all data is continuously backed up to Amazon S3. Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.
  • 11. 11 www.matillion.com © 2017 Matillion. All rights reserved. Redshift features Automated Backups Amazon Redshift automatically and continuously backs up new data to Amazon S3. It stores your snapshots for a user-defined period from 1 up to 35 days. You can take your own snapshots at any time, and they are retained until you explicitly delete them. Amazon Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery. Once you delete a cluster, your system snapshots are removed, but your user snapshots are available until you explicitly delete them. Fast Restores You can use any system or user snapshot to restore your cluster using the AWS Management Console or the Amazon Redshift APIs. Your cluster is available as soon as the system metadata has been restored and you can start running queries while user data is spooled down in the background. Encryption With just a couple of parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit and hardware- accelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups. By default, Amazon Redshift takes care of key management but you can choose to manage your keys using your own hardware security modules (HSMs), AWS CloudHSM, or AWS Key Management Service. Network Isolation Amazon Redshift enables you to configure firewall rules to control network access to your data warehouse cluster. You can run Amazon Redshift inside Amazon VPC to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using industry-standard encrypted IPsec VPN. Audit and Compliance Amazon Redshift integrates with AWS CloudTrail to enable you to audit all Redshift API calls. Amazon Redshift also logs all SQL operations, including connection attempts, queries and changes to your database. You can access these logs using SQL queries against system tables or choose to have them downloaded to a secure location on Amazon S3. Amazon Redshift is compliant with SOC1, SOC2, SOC3 and PCI DSS Level 1 requirements. For more details, please visit AWS Cloud Compliance.
  • 12. 12 www.matillion.com © 2017 Matillion. All rights reserved. Setting up Redshift Cluster ● Create IAM Role ● Launch a cluster ● Authorize cluster access ● Connect to cluster ● Loading data.(Matillion)
  • 13. 13 www.matillion.com © 2017 Matillion. All rights reserved. Launch a Redshift Cluster Go into the Amazon web services console and select "Redshift" from the list of services.
  • 14. 14 www.matillion.com © 2017 Matillion. All rights reserved. Enter suitable values for the cluster identifier, database name (e.g. 'snowplow'), port, username and password. Click the "Continue" button.
  • 15. 15 www.matillion.com © 2017 Matillion. All rights reserved. We now need to configure the cluster size. Select the values that are most appropriate to your situation. We generally recommend starting with a single node cluster with node type i.e. a dw1.xlarge or dw2.large node, and then adding nodes as your data volumes grow. You now have the opportunity to encrypt the database and set the availability zone if you wish. Select your preferences and click "Continue".
  • 16. 16 www.matillion.com © 2017 Matillion. All rights reserved. Amazon summarises your cluster information. Click "Launch Cluster" to fire your Redshift instance up. This will take a few minutes to complete. Alternatively, you could use AWS CLI to launch a new cluster. The outcome of the above steps could be achieved with the following command. $ aws redshift create-cluster --node-type dc1.large --cluster-type single-node --cluster-identifier snowplow --db-name pbz --master-username admin --master-user-password TopSecret1
  • 17. 17 www.matillion.com © 2017 Matillion. All rights reserved. Matillion as ETL Tool External Data On Premises Databases Matillion ETL Server Data Transfer Redshift V P N RDS DynamoDB VPC Browser Design
  • 18. 18 www.matillion.com © 2017 Matillion. All rights reserved. Matillion Data Sources
  • 19. 19 www.matillion.com © 2017 Matillion. All rights reserved. ELT - Extract, Load then Transform MATILLION EXTRACT, LOAD & TRANSFORM APPROACH TRADITIONAL EXTRACT, LOAD & TRANSFORM APPROACH
  • 21. 21 www.matillion.com © 2017 Matillion. All rights reserved. Presented by: Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or service marks are property of their respective owners. 3/20/2017 Thank You Harpreet Singh & Nick Tierney

Editor's Notes

  1. So what’s Matillion? Benefits of an ETL tool (ease, speed of dev, skills) albeit re-booted for 2017 …but, using an ELT architecture … powerful features (mention data load, inc Teradata, GBQ, sources) … delivered in a retail like commercial model
  2. Benefits for customers – why they buy it Bring these to life