SlideShare a Scribd company logo
1 of 14
1
www.matillion.com
© 2017 Matillion. All rights reserved.
Presented by:
Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks
or service marks are property of their respective owners. 9/21/2017
September 20th, 2017
Part 1: Getting Started with Amazon Redshift Spectrum
Spectrum Webinar Series
David Langton and James Johnson
2
www.matillion.com
© 2017 Matillion. All rights reserved.
• What is Amazon Redshift Spectrum?
• Query Architecture
• External Schemas
• External Tables
• Queries and Plans
Introduction
3
www.matillion.com
© 2017 Matillion. All rights reserved.
• Run Redshift SQL directly against exabytes of S3 data
• Store table data in Redshift or S3. Query from Redshift.
• Scales independently from Redshift
• Priced per-query
What is Amazon Redshift Spectrum?
4
www.matillion.com
© 2017 Matillion. All rights reserved.
• On Premise
• Amazon Redshift on AWS
• Amazon Redshift on AWS with Spectrum
Data Warehouse Scalability
5
www.matillion.com
© 2017 Matillion. All rights reserved.
Architecture of Amazon Redshift Spectrum
Catalog
JDBC/ODBC Client Application
Leader Node
Node 1 Node nNode 1
Spectrum Spectrum Spectrum Spectrum
S3
Independent Scaling
Redshift Scaling
6
www.matillion.com
© 2017 Matillion. All rights reserved.
• Provide access to S3 data seamlessly from Redshift
• Queries against external schema still understand ordinary Redshift SQL
• External Schema is a Pointer to a Data Catalog
- Athena (legacy)
- AWS Glue (new)
• Requires an IAM Role
- Spectrum is an external service
- It does not run in your VPC.
- The role must be attached to your Cluster
External Schemas
Redshift Schema Data Catalog
7
www.matillion.com
© 2017 Matillion. All rights reserved.
Demo 1 - Setup external schema
8
www.matillion.com
© 2017 Matillion. All rights reserved.
• Data Formats
- Avro, CSV, Grok, ORC, Parquet, RCFile,
RegexSerDe, SequenceFile, TextFile, and TSV
• Table Definition Syntax
- STRUCT, ARRAY, and MAP types are
unsupported.
- May be partitioned on one or more keys
• Sort / Distribution Options
- Spectrum does not use the same concepts -
you cannot set sort and/or distribution options
for Spectrum
External Tables
CREATE EXTERNAL TABLE
"webinar"."webinar_airports"
(
"iata" varchar(255),
"airport" varchar(255),
"city" varchar(255),
"state" varchar(255),
"country" varchar(255),
"lat" decimal(12, 8),
"long" decimal(12, 8)
)
STORED AS TEXTFILE
location
's3://<bucket>/<path>';
9
www.matillion.com
© 2017 Matillion. All rights reserved.
Demo 2 - Setup external tables
10
www.matillion.com
© 2017 Matillion. All rights reserved.
• The query plan shows when Redshift pushes down all or part of a query to
Spectrum
- S3 Seq Scan, S3 HashAggregate, S3 Query Scan, Seq Scan PartitionInfo, Partition Loop
• This plan is for a query on a single Redshift tables:
XN Seq Scan on webinar_airports_redshift (cost=0.00..42.20 rows=2 width=78)
Filter: ((city)::text = 'New York'::text)
• This plan is when that table is instead in Spectrum:
XN S3 Query Scan webinar_airports (cost=0.00..225000000.00 rows=10000000000 width=2028)
-> S3 Seq Scan video.webinar_airports location:"s3://mtln-spectrum-data/webinar/demo1"
format:TEXT (cost=0.00..125000000.00 rows=10000000000 width=2028)
Filter: ((city)::text = 'New York'::text)
Querying Redshift Tables and Spectrum Tables
11
www.matillion.com
© 2017 Matillion. All rights reserved.
Summary
12
www.matillion.com
© 2017 Matillion. All rights reserved.
Coming soon…the next 2 webinars in our series
• October 4, 11AM EST: Using Amazon Redshift Spectrum from Matillion ETL
• October 18, 11AM EST: Accessing your Data Lake assets from Amazon Redshift
Spectrum
• October 4, 11AM EST: Using Amazon Redshift Spectrum from Matillion ETL
• October 18, 11AM EST: Accessing your Data Lake assets from Amazon Redshift Spectrum
13
www.matillion.com
© 2017 Matillion. All rights reserved.
14
www.matillion.com
© 2017 Matillion. All rights reserved.
Presented by:
Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks
or service marks are property of their respective owners. 9/21/2017
Thank You
David Langton and James Johnson

More Related Content

What's hot

The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsDan Lynn
 
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At ScaleOperationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At ScaleDatabricks
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know SnowflakeKnoldus Inc.
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Acid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeAcid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeMichal Gancarski
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Migration to Redshift from SQL Server
Migration to Redshift from SQL ServerMigration to Redshift from SQL Server
Migration to Redshift from SQL Serverjoeharris76
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKnoldus Inc.
 
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1Mark Kromer
 

What's hot (20)

The Holy Grail of Data Analytics
The Holy Grail of Data AnalyticsThe Holy Grail of Data Analytics
The Holy Grail of Data Analytics
 
Operationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At ScaleOperationalizing Big Data Pipelines At Scale
Operationalizing Big Data Pipelines At Scale
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Acid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeAcid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta Lake
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Migration to Redshift from SQL Server
Migration to Redshift from SQL ServerMigration to Redshift from SQL Server
Migration to Redshift from SQL Server
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
 
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1
 

Similar to Webinar | Getting Started With Amazon Redshift Spectrum

Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Amazon Web Services
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumAmazon Web Services
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumAmazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon AthenaSungmin Kim
 
Reach New Heights with Amazon Redshift
Reach New Heights with Amazon RedshiftReach New Heights with Amazon Redshift
Reach New Heights with Amazon RedshiftMatillion
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataJohn Beresniewicz
 
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...Tammy Bednar
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Amazon Web Services
 

Similar to Webinar | Getting Started With Amazon Redshift Spectrum (20)

Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & SpectrumABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
ABD304-R-Best Practices for Data Warehousing with Amazon Redshift & Spectrum
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Reach New Heights with Amazon Redshift
Reach New Heights with Amazon RedshiftReach New Heights with Amazon Redshift
Reach New Heights with Amazon Redshift
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
 

More from Matillion

Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQueryMatillion
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
ELT is Better. Here's Why.
ELT is Better. Here's Why. ELT is Better. Here's Why.
ELT is Better. Here's Why. Matillion
 
Pick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehousePick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehouseMatillion
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data LakesMatillion
 
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesUsing ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesMatillion
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with SnowflakeMatillion
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best PracticesMatillion
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersMatillion
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data WarehouseMatillion
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftKickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftMatillion
 
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Matillion
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Matillion
 
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumWebinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumMatillion
 

More from Matillion (14)

Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
ELT is Better. Here's Why.
ELT is Better. Here's Why. ELT is Better. Here's Why.
ELT is Better. Here's Why.
 
Pick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehousePick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data Warehouse
 
Dive Into Data Lakes
Dive Into Data LakesDive Into Data Lakes
Dive Into Data Lakes
 
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesUsing ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 Minutes
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
 
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon RedshiftKickstart your data strategy for 2018: Getting started with Amazon Redshift
Kickstart your data strategy for 2018: Getting started with Amazon Redshift
 
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
Simplifying Your Journey to the Cloud: The Benefits of a Cloud-Based Data War...
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
 
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumWebinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 

Webinar | Getting Started With Amazon Redshift Spectrum

  • 1. 1 www.matillion.com © 2017 Matillion. All rights reserved. Presented by: Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or service marks are property of their respective owners. 9/21/2017 September 20th, 2017 Part 1: Getting Started with Amazon Redshift Spectrum Spectrum Webinar Series David Langton and James Johnson
  • 2. 2 www.matillion.com © 2017 Matillion. All rights reserved. • What is Amazon Redshift Spectrum? • Query Architecture • External Schemas • External Tables • Queries and Plans Introduction
  • 3. 3 www.matillion.com © 2017 Matillion. All rights reserved. • Run Redshift SQL directly against exabytes of S3 data • Store table data in Redshift or S3. Query from Redshift. • Scales independently from Redshift • Priced per-query What is Amazon Redshift Spectrum?
  • 4. 4 www.matillion.com © 2017 Matillion. All rights reserved. • On Premise • Amazon Redshift on AWS • Amazon Redshift on AWS with Spectrum Data Warehouse Scalability
  • 5. 5 www.matillion.com © 2017 Matillion. All rights reserved. Architecture of Amazon Redshift Spectrum Catalog JDBC/ODBC Client Application Leader Node Node 1 Node nNode 1 Spectrum Spectrum Spectrum Spectrum S3 Independent Scaling Redshift Scaling
  • 6. 6 www.matillion.com © 2017 Matillion. All rights reserved. • Provide access to S3 data seamlessly from Redshift • Queries against external schema still understand ordinary Redshift SQL • External Schema is a Pointer to a Data Catalog - Athena (legacy) - AWS Glue (new) • Requires an IAM Role - Spectrum is an external service - It does not run in your VPC. - The role must be attached to your Cluster External Schemas Redshift Schema Data Catalog
  • 7. 7 www.matillion.com © 2017 Matillion. All rights reserved. Demo 1 - Setup external schema
  • 8. 8 www.matillion.com © 2017 Matillion. All rights reserved. • Data Formats - Avro, CSV, Grok, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV • Table Definition Syntax - STRUCT, ARRAY, and MAP types are unsupported. - May be partitioned on one or more keys • Sort / Distribution Options - Spectrum does not use the same concepts - you cannot set sort and/or distribution options for Spectrum External Tables CREATE EXTERNAL TABLE "webinar"."webinar_airports" ( "iata" varchar(255), "airport" varchar(255), "city" varchar(255), "state" varchar(255), "country" varchar(255), "lat" decimal(12, 8), "long" decimal(12, 8) ) STORED AS TEXTFILE location 's3://<bucket>/<path>';
  • 9. 9 www.matillion.com © 2017 Matillion. All rights reserved. Demo 2 - Setup external tables
  • 10. 10 www.matillion.com © 2017 Matillion. All rights reserved. • The query plan shows when Redshift pushes down all or part of a query to Spectrum - S3 Seq Scan, S3 HashAggregate, S3 Query Scan, Seq Scan PartitionInfo, Partition Loop • This plan is for a query on a single Redshift tables: XN Seq Scan on webinar_airports_redshift (cost=0.00..42.20 rows=2 width=78) Filter: ((city)::text = 'New York'::text) • This plan is when that table is instead in Spectrum: XN S3 Query Scan webinar_airports (cost=0.00..225000000.00 rows=10000000000 width=2028) -> S3 Seq Scan video.webinar_airports location:"s3://mtln-spectrum-data/webinar/demo1" format:TEXT (cost=0.00..125000000.00 rows=10000000000 width=2028) Filter: ((city)::text = 'New York'::text) Querying Redshift Tables and Spectrum Tables
  • 11. 11 www.matillion.com © 2017 Matillion. All rights reserved. Summary
  • 12. 12 www.matillion.com © 2017 Matillion. All rights reserved. Coming soon…the next 2 webinars in our series • October 4, 11AM EST: Using Amazon Redshift Spectrum from Matillion ETL • October 18, 11AM EST: Accessing your Data Lake assets from Amazon Redshift Spectrum • October 4, 11AM EST: Using Amazon Redshift Spectrum from Matillion ETL • October 18, 11AM EST: Accessing your Data Lake assets from Amazon Redshift Spectrum
  • 14. 14 www.matillion.com © 2017 Matillion. All rights reserved. Presented by: Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or service marks are property of their respective owners. 9/21/2017 Thank You David Langton and James Johnson

Editor's Notes

  1. create external schema webinar from data catalog database 'webinar' iam_role 'arn:aws:iam::115603513764:role/SpectrumRole' create external database if not exists;
  2. http://docker.dc.matillion.com:9001/#Test/Spectrum/default/Webinar%20Create%20External%20Table