SlideShare a Scribd company logo
All	content	is	the	property	and	proprietary	interest	of	matrix	IT;	 The	removal	of	any	proprietary	notices,	including	attribution	information,	is	strictly	prohibited.
Redshift Spectrum & AWS Athena
Deep DiveOz	Levi,	CTO
MatrixBI
About	Us	
Leading BI
& Big Data
solution
provider in
Israel
230
Employees
100Over
Customers
Matrix IT
Subsidiary
E2E
&
A2Z
Named as a
Tier 1 Big
Data SI
(2017)
Leading
Partnerships Big	Data
Data	
Science
BI	&	
Analytics
Our	Solutions	
Data	Warehousing	&	Big	Data	
Advanced	Analytics	&	Data	Science	
Reporting	&	Self	Analysis	
Advanced	Visualization
Dashboards	&	KPI’s	 OEM	Big	Data	
Mobile	BI
ETL	&	Data	Integration
Matrix	BI	Big	Data	References
High-tech & Startups Government & Security
Enterprise (Telco, Finance, BIOTech & Pharma)
Clal Finances
Israel Police
Israeli
Air
Force
First...	Some	Basics	
Redshift	Spectrum	&	AWS	Athena	Deep	Dive
Availability	by	decupling	– Shared	nothing	
architecture
C1 C2 C3
SAN	/	NAS
• Removes	dependency	between	the	scaling	units
• Shared	file	system	resource	may	eventually	become	a	bottleneck
Compute
Storage SAN	/	NAS
C1 Compute
Storage
Compute
Storage
Compute
Storage
LAN LAN LAN
DATA VOLUMESMALL BIG
$$$
C2 C3
Shared Nothing
Bigger	Data	requires	elasticity	
ELASTIC
Compute Compute Compute
Compute Compute Compute
Compute Compute Compute
Scalable	Object	Storage
Elasticity	is	the	ability	of	the	system	to	adapt	to	changes	by	adding	or	removing	
processing	power	without	dependency	in	storage	and	automatically
Column	Storage
A B C
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
A5 B6 C5
A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 A5 B6 C5
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5
Stored	Together
Encoded	together
Row	Layout
Column	Layout
• Hard	to	compress	– different	data	types	required	different	algorithms
• Heavy	I/O	tasks	– for	reading	the	data
INT CHAR DATE
A B C
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
A5 B6 C5
Vertical partitioning
(projection push down)
Horizontal Partitioning
(Predicate push down)
Minimal I/O at read
A B C
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
A5 B6 C5
A B C
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
A5 B6 C5
+ =
Reduce additional I/O by partitioning
Column	Storage
Column	Storage	
SELECT AVG(score) FROM example
WHERE class = ‘Junior’
AND gender = ‘F’
AND grade > 90;
I	have	a	table	with	every	test	score	for	
every	US	student	for	the	last	twenty	years.
M	NYAASE 1245 NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE Junior NYSE NYSE NYSE 86
F		NYAASE 1453 NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE Soph NYSE NYSE NYSE 74
F		NYAASE 4454 NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE Junior NYSE NYSE NYSE 94
M	NYAASE 5654 NYASE NYAASE NYSE NYASE NGGYSE NYGGGSE NYSE NYSE NYSE Senior NYSE NYSE NYSE 67
Row	Store
Column	Store
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
NYSE
NYSE
NYSE
NQDS
M
F
F
M
Junior
Soph
Junior
Senior
86
74
94
67
Index
When	using	indexes,	reads	only	relevant	ROWs
On	a	HEAP	– RDBMS	Will	scan	all	rows!
Read	only	relevant	blocks	of	relevant	columns
How	can	I	provide	results	in	minimal	query	response	times?
Columnar	fileOpenWide	spread
The	Three	sub-systems	of	a	data	lake	
Data Acquisition
Collection
Real Time
Incremental
Batch
One Time Dump
Data Management
Store, Process & Integrate
Data Access
Deliver & Use
Data Lake Platform Services
1 2 3
RAW Data
The three subsystem approach
allows:
1. Open Scalable Architecture
2. Separation of duties
3. Fine grained Security &
Governance
RAW Data
Standardized
Data
Usage
Specific
What about
all of this
raw data??
Amazon	Analytics	End	to	End	Architecture
S3
AthenaEMR
Redshift
Spectrum
Amazon	ML	/	MXNet
RDS
QuickSight
Kinesis
Database	
Migration
Service
Glue
IAM
Other
Sources
Data	Catalog
Redshift
AWS	Athena
What is Athena?
• Fully Managed, Interactive query service
• Allows running standard SQL queries
against data stored in S3
• Fully serverless and automatically
scalable
Athena	Under	the	hood
Athena is an interactive query service that makes it easy to analyze
data directly from AWS S3 using Standard SQL
Presto	
CLI
Hive	
Metastore
Presto	
Coordinator
Presto	
Worker
JDBC	/	
ODBC
Presto	
Worker
Presto	
Worker
SQL On Hadoop solution, Presto is a low latency distributed SQL query
engine for running interactive analytic queries against data sources of all
sizes ranging from gigabytes to petabytes.
Hive Metastore (aka HCatalog) is a Metadata and Table management system
designed for Hadoop and used for Table abstraction (schema on read) .Hive DDL
functionality allows working with Partitions, Complex Data types (Arrays etc.) and
many data formats.
High Performance
• E.g. Netflix: runs 3500+ Presto queries / day on 25+ PB dataset in
S3 with 350 active platform users
Extensibility
• Pluggable backbends: Hive, Cassandra, JMX, Kafka, MySQL,
PostgreSQL, MySQL, and more
• JDBC, ODBC for commercial BI tools or dashboards
• Client Protocol: HTTP+JSON, support various languages (Python,
Ruby, PHP, Node.js, Java(JDBC), C#,…)
ANSI SQL
• Complex queries, joins, aggregations, various functions (Window
functions)
Presto	Main	Features
Things	to	remember
• $5 Per TB (Rounded to nearest MB with minimum of 10MB)
• Supports multiple formats
• JSON
• CSV & TSV
• ORC
• Parquet & Avro
Data Partitioning
+
Data Format
= Less $$$ per Query
I	want	to	pay	less	$$$
CREATE EXTERNAL TABLE db_name.taxi_rides_parquet (
vendorid STRING,
pickup_datetime TIMESTAMP,
dropoff_datetime TIMESTAMP,
ratecode INT,
passenger_count INT,
trip_distance DOUBLE,
fare_amount DOUBLE,
total_amount DOUBLE,
payment_type INT
)
PARTITIONED BY (YEAR INT, MONTH INT, TYPE string)
STORED AS PARQUET
LOCATION 's3://serverless-analytics/canonical/NY-Pub’
TBLPROPERTIES ('has_encrypted_data'=’true');
Creating	a	Table
2
3
1
Define partitions2
Location can only reference a folder
1
3
Tables are External
S3	Partitioning
By partitioning your data, you can:
• Separates data files by any column
• Read only files the query needs
• Reduce amount of data scanned
• Reduce query completion time
• Reduce query cost
Hive compatible partition naming (best) - [column_name = column_value]
% aws s3 ls s3://matrixbi/hive-prt/tables/visits/
PRE year=2017/month=08
PRE year=2017/month=09
PRE year=2017/month=10
PRE year=2017/month=11
PRE year=2017/month=12
Advanced	Compression	by	Encoding	
Delta Prefix Dictionary Run	Length
Sorted	Datasets Small	Sets	of	Values Repetitive
Timestamp
Sequences
Metrics
IP	Addresses
Codes	(MAC	etc.)
Product	ID
Commonly	used	encoding	types
Brut	Force	(LZO,	Snappy,	GZIP)
Advanced	Compression	by	Encoding	
Delta
File	Format	
SELECT count(*) as count FROM taxi_rides_csv
Run	time:	20.06 seconds,	Data	scanned:	207.54GB – 1,310,911,060
SELECT count(*) as count FROM taxi_rides_parquet
Run	time:	5.76 seconds,	Data	scanned:	0KB – 2,870,781,820
SELECT * FROM taxi_rides_csv limit 1000
Run	time:	3.13	seconds,	Data	scanned:	328.82MB
SELECT * FROM taxi_rides_parquet limit 1000
Run	time:	1.13	seconds,	Data	scanned:	5.2MB
Based on: Amazon Athena Deep Dive – June 2017
Parquet	columns	are	not	accessed!	
The	count	is	computed	using	metadata	
stored	in	Parquet	file	footers
*	S3	Get	prices	are	not	included!
Redshift	Spectrum
What is Redshift Spectrum?
• Not an integration between Redshift &
Athena query engine
• Allows running Redshift SQL queries
against data stored in S3
• Fully serverless and automatically
scalable
• Approachable from multiple Redshift
Clusters, Allows joining data from the RS
cluster
Redshift	Architecture
• MPP Architecture
• Leader based query execution
• Compute nodes
• Columnar
• Parallel
A redshift cluster cannot be
shut down, only deleted!
The	lifecycle	of	a	Spectrum	Query
A	Query	is	submitted	to	the	leader	node.	the	
leader	node	of	an	Amazon	Redshift	cluster.	
The	leader	node	optimizes,	compiles,	and	
pushes	the	query	execution	to	the	compute	
nodes	in	your	Amazon	Redshift	cluster
1
The	lifecycle	of	a	Spectrum	Query
The	compute	nodes	obtain	the	
information	describing	the	
external	tables	from	the	data	
catalog,	dynamically	pruning	non-
relevant	partitions	based	on	the	
filters	and	joins	in	the	query
2
The	lifecycle	of	a	Spectrum	Query
The	compute	nodes	also	examine	the	
data	available	locally	and	push	down	
predicates	to	efficiently	scan	only	the	
relevant	objects	in	Amazon	S3.
3
The	lifecycle	of	a	Spectrum	Query
Redshift	compute	nodes	then	
generate	multiple	requests	
depending	on	the	number	of	objects	
that	need	to	be	processed,	and	
submit	them	concurrently	to	
Redshift	Spectrum
4
The	lifecycle	of	a	Spectrum	Query
Spectrum	worker	nodes	scan,	
filter,	and	aggregate	the	data	
stored	on	S3,	streaming	
required	data	for	processing	
back	to	the	Redshift	cluster
5
The	lifecycle	of	a	Spectrum	Query
The	final	join	and	merge	operations	are	
performed	locally	in	your	cluster	and	
the	results	are	returned	to	the	client.
6
Multi	cluster	
• Shared Datasets on S3 & Spectrum can be accessed from multiple
Redshift Clusters!
Spectrum	Requires	an	External	Schema
CREATE EXTERNAL SCHEMA spectrum_schema FROM data catalog
DATABASE spectrum_db
iam_role ‘arn:aws:iam:123456789012:role/MySpectrumRole’
REGION ‘us-east-2’;
CREATE VIEW redshift_schema.joined_data AS
SELECT SE.col1, RE.col2
FROM spectrum_schema.historical_events SE
INNER JOIN redshift_schema.events RE ON
spectrum_schema.historical_events.joincol1 = redshift_schema.joincol2
1
2
1 Glue or Athena Catalog
2 IAM Role - Spectrum runs outside of the VPC, The role must be attached to the cluster
So	what	now?
So	what	now?	
• You are an existing Redshift Customers and you want to
implement Data Tiering and maintain your Redshift
investment while allowing a single schema for historical
data
Use Redshift Spectrum if:
Use AWS Athena if:
• You are starting out your Data Lake journey
• You have offline analytical workloads in large scales
So	what	now?	
&
Key	Take	away	
Partition	you	data	for	best	performance	and	cost
• Use small	files	– You	can	create	multiple	tables	pointing	the	
same	files
• Parquet	can	be	sorted	on	write	for	better	performance
You	can	Combine	Amazon	Athena	and	Redshift	
Spectrum	using	AWS Glue	Data	Catalog	for	best	
performance
So	what	now?	
Next	- Take	your	first	steps	in	
building	serverless	Data	Lake
A Few Notes
You	will	need	an	AWS	account	– If	
you	don’t	have	one	let	us	know!
For	1:1	session	with	an	Architect
Send	an	email	to	
marketing@cloudzone.io
Thank You!
CV to jobs@matrixbi.co.il
Ozle@MatrixBI.co.il

More Related Content

What's hot

Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksAmazon Web Services
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon AthenaJulien SIMON
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsAmazon Web Services
 

What's hot (20)

BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Deep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech TalksDeep Dive on Amazon Athena - AWS Online Tech Talks
Deep Dive on Amazon Athena - AWS Online Tech Talks
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon Athena
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
AWS & Database Analytics
AWS & Database AnalyticsAWS & Database Analytics
AWS & Database Analytics
 
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon RedshiftData Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
Data Warehousing in the Era of Big Data: Deep Dive into Amazon Redshift
 
Athena & Glue
Athena & GlueAthena & Glue
Athena & Glue
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF LoftData Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
Data Warehousing with Amazon Redshift: Data Analytics Week at the SF Loft
 
How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS Analytics
 

Similar to Redshift Spectrum & AWS Athena Deep Dive

AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Amazon Web Services
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Amazon Web Services
 
FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...
FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...
FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...Amazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon RedshiftAmazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseAmazon Web Services
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitAmazon Web Services
 
Deploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSDeploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSAmazon Web Services
 

Similar to Redshift Spectrum & AWS Athena Deep Dive (20)

AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
 
FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...
FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...
FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveilla...
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
 
Deploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWSDeploying Your Data Warehouse on AWS
Deploying Your Data Warehouse on AWS
 

Recently uploaded

Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdfKamal Acharya
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdfKamal Acharya
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxMd. Shahidul Islam Prodhan
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfAyahmorsy
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisDr. Radhey Shyam
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf884710SadaqatAli
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234AafreenAbuthahir2
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdfKamal Acharya
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGKOUSTAV SARKAR
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdfKamal Acharya
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-IVigneshvaranMech
 

Recently uploaded (20)

Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 

Redshift Spectrum & AWS Athena Deep Dive