SlideShare a Scribd company logo
1 of 40
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roy Ben-Alta, Principal BDM
MAY-2-2017
shift
Fast, simple, Exabyte-scale data warehousing for less than $1,000/TB/Year
Data warehousing in the era of Big Data: Deep
Dive into Amazon Redshift Spectrum
Overview
• History of Data Warehouse, Hadoop
• Amazon Redshift In Modern Data Architecture
• Amazon Redshift Spectrum Overview
• Getting Started
• Q&A
History Of Data Warehouse, Hadoop
Evolution of Data Architectures
1985: Data Warehouse Appliances Benefits
• Consolidated multiple decision support
environments (i.e. databases) into a single
architecture
• Best performance available at time of
conception, hence the expensive licenses
• Worked well with structured, columnar data
• Could build customized data marts on top
Shared Storage Tier
(NAS Appliance)
Compute
Node
Compute
Node
Compute
Node
Compute
Node
• Proprietary software license paid per node
per year
• Gold-plated hardware available only from
the vendor with per node per year cost
Constraints
• Proprietary software license paid per node per
year
• Gold-plated hardware available only from the
vendor with per node per year cost
• Could not handle unstructured data sets
• Heavy ETL & data cleansing
Data analyzed for benefit
Available data
Legacy Architecture Models = No Growth
COST
VALUE
Investment value of analytics
2010 2015 2020 2025
Datavolume
Very Expensive
Lock-In
Proprietary
Inflexible licensing
Evolution of Data Architectures
2006: Hadoop Clusters
CPU
Memory
HDFS Storage
Hadoop Master Node
CPU
Memory
HDFS Storage
CPU
Memory
HDFS Storage
Improvements
• Open source based software license!!!
• Commodity white box servers!!!!
• Could handle structured & unstructured data sets
• Many different applications within the framework
(MapReduce, Spark, Hive, Pig, HBase, Presto, etc.)
Constraints
• HDFS 3X replication to protect against node failure
gets expensive at scale
• 500 TB data set = 1.5 PB cluster
• Local storage means you must scale and pay for
CPU & memory resources when adding data
capacity
• General purpose, monolithic cluster with many
different apps on same hardware
• Still a data silo
Evolution of Data Architectures
2009: Decoupled EMR Architecture
CPU
Memory
Hadoop Master Node
CPU
Memory
CPU
Memory
Improvements
• Decoupled storage & compute
• Scale CPU and memory resources independently
and up & down
• Only pay for the 500 TB data set (not 3X)
• Multi-physical facility replication via S3
• Multiple clusters can run in parallel against shared
data in S3
• Each job gets its own optimized cluster. i.e. Spark
on memory intensive, Hive on CPU intensive,
HBase on I/O intensive, etc.
Constraints
• Still have a cluster to provision and manage
• Must expose EMR cluster to SQL users via Hive,
Presto, etc.
S3 as HDFS
2012
February 2017
> 100 Significant Patches
> 140 Significant Features
• Automated installation, patching, backups
• No servers to manage and maintain
• MPP Columnar relational database
• $1,000 / TB / Year
• Accessible to any ODBC or JDBC BI Tool
Evolution of Data Architectures
2012: Hello Amazon Redshift
Evolution of Data Architectures
2016: Clusterless Improvements
• No cluster/infrastructure to manage
• Business users and analysts can write SQL
without having to provision a cluster or touch
infrastructure
• Pay by the query
• Zero Administration
• Process data where it lives
Constraints
• Limited to SQL, Hive and Spark jobs today.
More frameworks to come!
SQL Interface in web
browser
Athena for SQL
S3 Data Lake
Glue for ETL
S3 Data Lake
Spark & Hive Interface
in web browser
AWS Big Data Portfolio
Collect Store Analyze
Amazon Kinesis
Firehose
AWS Direct
Connect
Amazon
SnowballAmazon Kinesis
Streams
Amazon S3 Amazon Glacier
Amazon
CloudSearch
Amazon RDS,
Amazon Aurora
Amazon
Dynamo DB
Amazon
Elasticsearch
Amazon EMR Amazon EC2
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data PipelineAWS Database Migration Service AWS Glue
Amazon
Athena
Amazon Kinesis
Analytics
Scaling up your analytics systems With AWS Traditional IT *
get a new BI server 20 minutes 3 months
upgrade your analytics server to the
newest Intel processors and add 16GB
memory
15 minutes 2 months
add 500TB of storage instant 2 months
grow a DWH cluster from 8GB to 1PB 1 hour 8 months
build a 1024-node Hadoop cluster 30 minutes unlikely
roll out multi-region production
environment
hours months
* actual provisioning times in a well-organized IT division
Speed Matters
Amazon Redshift In Modern Data Architecture
Columnar
MPP
OLAP
AWS IAMAmazon
VPC
Amazon SWF
Amazon S3 AWS KMS Amazon
Route 53
Amazon
CloudWatch
Amazon
EC2
PostgreSQL Amazon Redshift
Amazon Redshift is easy to use
Provisioning in
minutes
Automatic patching SQL - Data loading
Backups are built-in Security is built-in Compression is built-in
Amazon Redshift is available everywhere AWS is
Dublin
Frankfurt
London
Seoul
Sydney
Tokyo
Singapore
Beijing
Mumbai
Sao Paulo
US East - Virginia
US West - Oregon
US West – Northern California
GovCloud
Columbus Ohio
Montreal
Currently Available
Coming soon
Traditional Data Warehousing
Business
Reporting
Complex pipelines
and queries
Secure and
Compliant
Easy Migration – Point & Click using AWS Database Migration Service
Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant
Large Ecosystem – Variety of cloud and on-premises BI and ETL tools
Japanese Mobile
Phone Provider
Powering 100 marketplaces
in 50 countries
World’s Largest Children’s
Book Publisher
Bulk Loads
and Updates
Business Applications
Multi-Tenant BI
Applications
Back-end
services
Analytics as a
Service
Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can
focus on your business applications
Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several
data marts
Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline
Infosys Information
Platform (IIP)
Analytics-as-a-
Service
Product and Consumer
Analytics
Log Analysis
Log & Machine
IOT Data
Clickstream
Events Data
Time-Series
Data
Cheap – Analyze large volumes of data cost-effectively
Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads
Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics
Interactive data analysis and
recommendation engine
Ride analytics for pricing
and product development
Ad prediction and
on-demand analytics
Redshift is used for mission-critical workloads
Financial and
management reporting
Payments to suppliers
and billing workflows
Web/Mobile clickstream
and event analysis
Recommendation and
predictive analytics
Amazon Redshift has a large ecosystem
Data Integration Systems IntegratorsBusiness Intelligence
Amazon Redshift Spectrum Overview
Generate
Collect & Store
Analyze
Individual AWS customers
generate over a PB/day
It’s never been easier to generate vast amounts of data
Generate
Collect & Store
Analyze
Individual AWS customers
generating over PB/day
Amazon S3 lets you collect and store all this data
Store exabytes of
data in S3
Generate
Collect & Store
Analyze
Individual AWS customers
generating over PB/day
Highly
Constrained
But how do you analyze it?
Store exabytes of
data in S3
The tyranny of “OR”
Amazon EMR
Directly access data in S3
Scale out to thousands of nodes
Open data formats
Popular big data frameworks
Anything you can dream up and code
Amazon Redshift
Optimized for data warehousing
Super-fast local disk performance
Sophisticated query optimization
Join-optimized data formats
Query using standard SQL
But I don’t want to choose.
I shouldn’t have to choose
I want “all of the above”
I want
sophisticated query optimization and scale-out processing
super fast performance and support for open formats
the throughput of local disk and the scale of S3
I want all this
From one data processing engine
With my data accessible from all data processing engines
Now and in the future
We’re told “you have to choose”
Pick small clusters for joins or large ones for scans
Shuffles are expensive
Open formats can’t collocate data for joins
They have to deal with variable cluster sizes
Query optimization requires statistics
You can’t determine this for external data
Enter Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency: Multiple
clusters access same data
No ETL: Query data in-
place using open file
formats
Full Amazon Redshift
SQL support
S3
SQL
Amazon Redshift Spectrum is fast
• Leverages Amazon Redshift’s advanced cost-based optimizer
• Pushes down projections, filters, aggregations and join reduction
• Dynamic partition pruning to minimize data processed
• Automatic parallelization of query execution against S3 data
• Efficient join processing within the Amazon Redshift cluster
Amazon Redshift Spectrum is cost-effective
• You pay for your Amazon Redshift cluster plus $5 per TB scanned from S3
• Each query can leverage 1000s of Amazon Redshift Spectrum nodes
• You can reduce the TB scanned and improve query performance by:
Partitioning data
Using a columnar file format
Compressing data
Amazon Redshift Spectrum is secure
End-to-end
data encryption
Alerts &
notifications
Virtual private cloud
Audit logging
Certifications &
compliance
Encrypt S3 data using SSE and AWS
KMS
Encrypt all Amazon Redshift data
using KMS, AWS CloudHSM or your
on-premises HSMs
Enforce SSL with perfect forward
encryption using ECDHE
Amazon Redshift leader node in
your VPC. Compute nodes in
private VPC. Spectrum nodes in
private VPC, store no state.
Communicate event-specific
notifications via email, text
message, or call with Amazon SNS
All API calls are logged using AWS
CloudTrail
All SQL statements are logged
within Amazon Redshift
PCI/DSSFedRAMP
SOC1/2/3 HIPAA/BAA
Amazon Redshift Spectrum uses standard SQL
• Redshift Spectrum seamlessly integrates with your existing SQL & BI apps
• Support for same JDBC and ODBC
• Support for complex joins, nested queries & window functions
• Support for data partitioned in S3 by any key
Date, Time and any other custom keys
e.g., Year, Month, Day, Hour
Is Amazon Redshift Spectrum useful if I don’t have an exabyte?
Your data will get bigger
• On average, data warehousing volumes grow 10x every 5 years
• The average Amazon Redshift customer doubles data each year
Amazon Redshift Spectrum makes data analysis simpler
• Teams using Amazon EMR, Athena & Redshift can collaborate using the same data lake
Amazon Redshift Spectrum improves availability and concurrency
• Run multiple Amazon Redshift clusters against common data
• Isolate jobs with tight SLAs from ad hoc analysis
“Redshift Spectrum will let us expand the universe
of the data we analyze to 100s of petabytes over
time. This is truly a game changer, and we can think
of no other system in the world that can get us
there.”
“Multiple teams can now query the same
Amazon S3 data sets using both Amazon
Redshift and Amazon EMR.”
Customers love Amazon Redshift Spectrum
“Redshift Spectrum will help us scale yet further
while also lowering our costs.”
“Redshift Spectrum’s fast performance across
massive data sets is unprecedented.”
“Redshift Spectrum enables us to directly operate on
our data in its native format in Amazon S3 with no
preprocessing or transformation.”
“Our data science team using Amazon EMR can now
collaborate with our marketing and product teams
using Redshift Spectrum to analyze the same Amazon
S3 data sets.”
Migrations can be accelerated with DMS & SCT
“AWS Database Migration Service is the
most impressive migration service we’ve
seen.”
Migrate – Over 1,000 unique migrations to Redshift using DMS
Amazon
Redshift
*
*
Amazon Redshift Spectrum
shift
Fast, simple, exabyte-scale data warehousing for less than $1,000/TB/Year
Available now
 Queue hopping
 10X VACUUM performance improvement
 Node fault tolerance
 Enhanced VPC routing
 IAM support for LOAD/UNLOAD
 Auto compression for CTAS
 TimestampTZ datatype
 Query Monitoring rules
Coming soon
 Automatic and incremental background
VACUUM
 Short query bias
 Power start
 IAM Authentication for DB users
 Auto compression for new tables
 Enhanced JSON & AVRO ingestion performance
Getting Started with Redshift Spectrum
Quick Start
Amazon Redshift Pricing
Q&A

More Related Content

What's hot

Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaAmazon Web Services
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudAmazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Amazon Web Services
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSAmazon Web Services
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 

What's hot (20)

Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 

Similar to Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift

BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...Amazon Web Services
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database ServicesAmazon Web Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 

Similar to Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift (20)

BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 

Recently uploaded (20)

Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roy Ben-Alta, Principal BDM MAY-2-2017 shift Fast, simple, Exabyte-scale data warehousing for less than $1,000/TB/Year Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift Spectrum
  • 2. Overview • History of Data Warehouse, Hadoop • Amazon Redshift In Modern Data Architecture • Amazon Redshift Spectrum Overview • Getting Started • Q&A
  • 3. History Of Data Warehouse, Hadoop
  • 4. Evolution of Data Architectures 1985: Data Warehouse Appliances Benefits • Consolidated multiple decision support environments (i.e. databases) into a single architecture • Best performance available at time of conception, hence the expensive licenses • Worked well with structured, columnar data • Could build customized data marts on top Shared Storage Tier (NAS Appliance) Compute Node Compute Node Compute Node Compute Node • Proprietary software license paid per node per year • Gold-plated hardware available only from the vendor with per node per year cost Constraints • Proprietary software license paid per node per year • Gold-plated hardware available only from the vendor with per node per year cost • Could not handle unstructured data sets • Heavy ETL & data cleansing
  • 5. Data analyzed for benefit Available data Legacy Architecture Models = No Growth COST VALUE Investment value of analytics 2010 2015 2020 2025 Datavolume Very Expensive Lock-In Proprietary Inflexible licensing
  • 6. Evolution of Data Architectures 2006: Hadoop Clusters CPU Memory HDFS Storage Hadoop Master Node CPU Memory HDFS Storage CPU Memory HDFS Storage Improvements • Open source based software license!!! • Commodity white box servers!!!! • Could handle structured & unstructured data sets • Many different applications within the framework (MapReduce, Spark, Hive, Pig, HBase, Presto, etc.) Constraints • HDFS 3X replication to protect against node failure gets expensive at scale • 500 TB data set = 1.5 PB cluster • Local storage means you must scale and pay for CPU & memory resources when adding data capacity • General purpose, monolithic cluster with many different apps on same hardware • Still a data silo
  • 7. Evolution of Data Architectures 2009: Decoupled EMR Architecture CPU Memory Hadoop Master Node CPU Memory CPU Memory Improvements • Decoupled storage & compute • Scale CPU and memory resources independently and up & down • Only pay for the 500 TB data set (not 3X) • Multi-physical facility replication via S3 • Multiple clusters can run in parallel against shared data in S3 • Each job gets its own optimized cluster. i.e. Spark on memory intensive, Hive on CPU intensive, HBase on I/O intensive, etc. Constraints • Still have a cluster to provision and manage • Must expose EMR cluster to SQL users via Hive, Presto, etc. S3 as HDFS
  • 8. 2012 February 2017 > 100 Significant Patches > 140 Significant Features • Automated installation, patching, backups • No servers to manage and maintain • MPP Columnar relational database • $1,000 / TB / Year • Accessible to any ODBC or JDBC BI Tool Evolution of Data Architectures 2012: Hello Amazon Redshift
  • 9. Evolution of Data Architectures 2016: Clusterless Improvements • No cluster/infrastructure to manage • Business users and analysts can write SQL without having to provision a cluster or touch infrastructure • Pay by the query • Zero Administration • Process data where it lives Constraints • Limited to SQL, Hive and Spark jobs today. More frameworks to come! SQL Interface in web browser Athena for SQL S3 Data Lake Glue for ETL S3 Data Lake Spark & Hive Interface in web browser
  • 10. AWS Big Data Portfolio Collect Store Analyze Amazon Kinesis Firehose AWS Direct Connect Amazon SnowballAmazon Kinesis Streams Amazon S3 Amazon Glacier Amazon CloudSearch Amazon RDS, Amazon Aurora Amazon Dynamo DB Amazon Elasticsearch Amazon EMR Amazon EC2 Amazon Redshift Amazon Machine Learning Amazon QuickSight AWS Data PipelineAWS Database Migration Service AWS Glue Amazon Athena Amazon Kinesis Analytics
  • 11. Scaling up your analytics systems With AWS Traditional IT * get a new BI server 20 minutes 3 months upgrade your analytics server to the newest Intel processors and add 16GB memory 15 minutes 2 months add 500TB of storage instant 2 months grow a DWH cluster from 8GB to 1PB 1 hour 8 months build a 1024-node Hadoop cluster 30 minutes unlikely roll out multi-region production environment hours months * actual provisioning times in a well-organized IT division Speed Matters
  • 12. Amazon Redshift In Modern Data Architecture
  • 13. Columnar MPP OLAP AWS IAMAmazon VPC Amazon SWF Amazon S3 AWS KMS Amazon Route 53 Amazon CloudWatch Amazon EC2 PostgreSQL Amazon Redshift
  • 14. Amazon Redshift is easy to use Provisioning in minutes Automatic patching SQL - Data loading Backups are built-in Security is built-in Compression is built-in
  • 15. Amazon Redshift is available everywhere AWS is Dublin Frankfurt London Seoul Sydney Tokyo Singapore Beijing Mumbai Sao Paulo US East - Virginia US West - Oregon US West – Northern California GovCloud Columbus Ohio Montreal Currently Available Coming soon
  • 16. Traditional Data Warehousing Business Reporting Complex pipelines and queries Secure and Compliant Easy Migration – Point & Click using AWS Database Migration Service Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant Large Ecosystem – Variety of cloud and on-premises BI and ETL tools Japanese Mobile Phone Provider Powering 100 marketplaces in 50 countries World’s Largest Children’s Book Publisher Bulk Loads and Updates
  • 17. Business Applications Multi-Tenant BI Applications Back-end services Analytics as a Service Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can focus on your business applications Ease of Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several data marts Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline Infosys Information Platform (IIP) Analytics-as-a- Service Product and Consumer Analytics
  • 18. Log Analysis Log & Machine IOT Data Clickstream Events Data Time-Series Data Cheap – Analyze large volumes of data cost-effectively Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics Interactive data analysis and recommendation engine Ride analytics for pricing and product development Ad prediction and on-demand analytics
  • 19. Redshift is used for mission-critical workloads Financial and management reporting Payments to suppliers and billing workflows Web/Mobile clickstream and event analysis Recommendation and predictive analytics
  • 20. Amazon Redshift has a large ecosystem Data Integration Systems IntegratorsBusiness Intelligence
  • 22. Generate Collect & Store Analyze Individual AWS customers generate over a PB/day It’s never been easier to generate vast amounts of data
  • 23. Generate Collect & Store Analyze Individual AWS customers generating over PB/day Amazon S3 lets you collect and store all this data Store exabytes of data in S3
  • 24. Generate Collect & Store Analyze Individual AWS customers generating over PB/day Highly Constrained But how do you analyze it? Store exabytes of data in S3
  • 25. The tyranny of “OR” Amazon EMR Directly access data in S3 Scale out to thousands of nodes Open data formats Popular big data frameworks Anything you can dream up and code Amazon Redshift Optimized for data warehousing Super-fast local disk performance Sophisticated query optimization Join-optimized data formats Query using standard SQL
  • 26. But I don’t want to choose. I shouldn’t have to choose I want “all of the above”
  • 27. I want sophisticated query optimization and scale-out processing super fast performance and support for open formats the throughput of local disk and the scale of S3
  • 28. I want all this From one data processing engine With my data accessible from all data processing engines Now and in the future
  • 29. We’re told “you have to choose” Pick small clusters for joins or large ones for scans Shuffles are expensive Open formats can’t collocate data for joins They have to deal with variable cluster sizes Query optimization requires statistics You can’t determine this for external data
  • 30. Enter Amazon Redshift Spectrum Run SQL queries directly against data in S3 using thousands of nodes Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query High concurrency: Multiple clusters access same data No ETL: Query data in- place using open file formats Full Amazon Redshift SQL support S3 SQL
  • 31. Amazon Redshift Spectrum is fast • Leverages Amazon Redshift’s advanced cost-based optimizer • Pushes down projections, filters, aggregations and join reduction • Dynamic partition pruning to minimize data processed • Automatic parallelization of query execution against S3 data • Efficient join processing within the Amazon Redshift cluster
  • 32. Amazon Redshift Spectrum is cost-effective • You pay for your Amazon Redshift cluster plus $5 per TB scanned from S3 • Each query can leverage 1000s of Amazon Redshift Spectrum nodes • You can reduce the TB scanned and improve query performance by: Partitioning data Using a columnar file format Compressing data
  • 33. Amazon Redshift Spectrum is secure End-to-end data encryption Alerts & notifications Virtual private cloud Audit logging Certifications & compliance Encrypt S3 data using SSE and AWS KMS Encrypt all Amazon Redshift data using KMS, AWS CloudHSM or your on-premises HSMs Enforce SSL with perfect forward encryption using ECDHE Amazon Redshift leader node in your VPC. Compute nodes in private VPC. Spectrum nodes in private VPC, store no state. Communicate event-specific notifications via email, text message, or call with Amazon SNS All API calls are logged using AWS CloudTrail All SQL statements are logged within Amazon Redshift PCI/DSSFedRAMP SOC1/2/3 HIPAA/BAA
  • 34. Amazon Redshift Spectrum uses standard SQL • Redshift Spectrum seamlessly integrates with your existing SQL & BI apps • Support for same JDBC and ODBC • Support for complex joins, nested queries & window functions • Support for data partitioned in S3 by any key Date, Time and any other custom keys e.g., Year, Month, Day, Hour
  • 35. Is Amazon Redshift Spectrum useful if I don’t have an exabyte? Your data will get bigger • On average, data warehousing volumes grow 10x every 5 years • The average Amazon Redshift customer doubles data each year Amazon Redshift Spectrum makes data analysis simpler • Teams using Amazon EMR, Athena & Redshift can collaborate using the same data lake Amazon Redshift Spectrum improves availability and concurrency • Run multiple Amazon Redshift clusters against common data • Isolate jobs with tight SLAs from ad hoc analysis
  • 36. “Redshift Spectrum will let us expand the universe of the data we analyze to 100s of petabytes over time. This is truly a game changer, and we can think of no other system in the world that can get us there.” “Multiple teams can now query the same Amazon S3 data sets using both Amazon Redshift and Amazon EMR.” Customers love Amazon Redshift Spectrum “Redshift Spectrum will help us scale yet further while also lowering our costs.” “Redshift Spectrum’s fast performance across massive data sets is unprecedented.” “Redshift Spectrum enables us to directly operate on our data in its native format in Amazon S3 with no preprocessing or transformation.” “Our data science team using Amazon EMR can now collaborate with our marketing and product teams using Redshift Spectrum to analyze the same Amazon S3 data sets.”
  • 37. Migrations can be accelerated with DMS & SCT “AWS Database Migration Service is the most impressive migration service we’ve seen.” Migrate – Over 1,000 unique migrations to Redshift using DMS Amazon Redshift * *
  • 38. Amazon Redshift Spectrum shift Fast, simple, exabyte-scale data warehousing for less than $1,000/TB/Year Available now  Queue hopping  10X VACUUM performance improvement  Node fault tolerance  Enhanced VPC routing  IAM support for LOAD/UNLOAD  Auto compression for CTAS  TimestampTZ datatype  Query Monitoring rules Coming soon  Automatic and incremental background VACUUM  Short query bias  Power start  IAM Authentication for DB users  Auto compression for new tables  Enhanced JSON & AVRO ingestion performance
  • 39. Getting Started with Redshift Spectrum Quick Start Amazon Redshift Pricing
  • 40. Q&A