SlideShare a Scribd company logo
1 of 36
P U B L I C S E C T O R
S U M M I T
SINGAPORE
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Implementing a Data Warehouse on
AWS in a Hybrid Environment
Dennis Magsajo
Solutions Architect
Worldwide Public Sector
AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Key Takeaways
Why do enterprises want hybrid cloud?
Data warehouse on AWS
Data warehouse design considerations
Customer story: Fannie Mae
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Run workloads
on the cloud
Tight
integration
Run workloads
on-premises
Without buying
new hardware
What do customers want in hybrid?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Integration of different data silos
Integrated identity and access
Integrated resources and deployment
management
Integrated devices and edge systems
Cloud bursting
Data center extension
Hybrid cloud use cases
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
The explosion of data that is being generated by cloud-based applications and services, as well as
data that is being migrated to cloud platforms from on-premises systems, is increasing exponentially.
THE DATA TSUNAMI
Data
every five
years
There is more data
than people think.
years
live for
Data platforms need to
scalegrows
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
ANALYTICS PIPELINE: AWS
CONSUME/VISUALIZEETL ANALYZECOLLECT STORE
AWS Glue
ETL & Data Catalog
AWS Lake Formation
Data lakes
Amazon Redshift
Data warehousing
Amazon EMR
Hadoop + Spark
Amazon Athena
Interactive analytics
Amazon Kinesis Data Analytics Real-
time
Amazon ES
Operational Analytics
AWS Database Migration Service
AWS Snowball
AWS Snowmobile
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
AWS Direct Connect
AWS DataSync
Amazon S3/Amazon S3 Glacier
Amazon RDS
Amazon Aurora
MySQL, PostgreSQL
Amazon ElastiCache
Redis, Memcached
Amazon Quantum
Ledger Database
Amazon DynamoDB
Key value
Amazon Neptune
Graph
Amazon Timestream
Time series
Amazon RDS on
VMWare
Amazon DocumentDB
Amazon EFS Amazon FSx
Amazon QuickSight
Amazon SageMaker
Amazon Machine Learning
(Amazon ML)
AWS Marketplace
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
What does
data warehouse
modernization
mean? Easy to use
Extends to
your data lake
Don’t waste time on
menial administrative
tasks and maintenance
Directly analyze data
stored in your data lake
in open formats
Any scale of data,
workloads, and users
Dynamically scale up to
guarantee performance even
with unpredictable demands
and data volumes
Faster
time-to-insights
Consistently fast
performance, even with
thousands of concurrent
queries and users
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Data warehouse on AWS
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
DESIGNING A CLOUD DATA WAREHOUSE
Prepare your sources: Batch or real-time?
Ingest: How are you going to get data into your data warehouse?
ETL: Is the data structured for the data warehouse or do you need to?
Data quality
What do you do about the quality of your data?
Partner solutions available
Auditing
Data governance
Master data management
Nightly jobs/ETL
Managing your data transformation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
ENTERPRISE DATA WAREHOUSE WORKFLOW
ETLSTORE ANALYZE VISUALIZECOLLECT
Amazon S3 Amazon
Redshift
Amazon S3 Amazon
QuickSight
Amazon EMR
AWS Glue
Amazon
Redshift
Spectrum
Unstructured
data
Structured
data
AWS Direct Connect
AWS Snowball
AWS Database
Migration
Service
AWS DataSync
Data Sources
On-premises
Data Sources
In Cloud
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AMAZON REDSHIFT ETLSTORE ANALYZE VISUALIZECOLLECT
Fastest
Get faster time-to-insight
for all types of analytics
workloads; powered by
machine learning, columnar
storage, and MPP
Unlimited
scale
Extends your
Data Lake
1/10th
the cost
Dynamically scale up to
guarantee performance
even with unpredictable
analytical demands and
data volumes
Analyze data in the Amazon
S3 data lake in-place and in
open formats, together with
data loaded into Amazon
Redshift’s high performance
SSDs
Start at $0.25 per hour,
save costs with automated
administration tasks and
eliminate business impact
due to downtime; as low as
$1,000 per terabyte per year
Fast, simple, cost-effective data
warehouse that can extend queries to your data lake
Analyze data in open formats
such as Parquet, ORC, and JSON, using SQL tools
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift
The four things that matter most
Speed Scale SecuritySimplicity
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AMAZON REDSHIFT SYSTEM ARCHITECTURE
Leader node
SQL endpoint
Stores metadata
Coordinates query execution
Compute nodes
Local, columnar storage
Execute queries in parallel
Load, backup, restore
through Amazon S3; load
from Amazon DynamoDB,
Amazon EMR
Two hardware platforms
Optimized for data processing
DS2: HDD; scale from 2TB to 2PB
DC1: SSD; scale from 160GB to 356TB
Ingestion/Backup/Restore
JDBC/ODBC
SQL Clients / BI
Tools
Data
Catalog
Leader Node
Compute
Nodes
10 GigE (HPC)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
A DEEPER LOOK AT COMPUTE NODE
ARCHITECTURE
• A compute node is partitioned into either two or 16 slices;
a slice can be thought of as a “virtual compute node”
• Each slice is allocated a portion of the compute node's
memory and disk space, where it processes a portion of the
workload assigned to the compute node by the leader node
• The leader node manages distributing data to the slices and
apportions the workload for any queries or other database
operations to the slices
• Slices are Amazon Redshift’s symmetric multi processing (SMP)
mechanism – they work in parallel to complete operations
Compute Node
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Security is built-in
Compliance certifications
10 GigE (HPC)
Customer
VPC
Internal
VPC
JDBC/ODBC
Compute
Nodes
Leader
Node
Network Isolation
End-to-end encryption
Integration with AWS Key
Management Service
(AWS KMS)
Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
AMAZON REDSHIFT ANTI-PATTERNS
Amazon Redshift is not ideally suited for the following usage patterns:
Small Datasets
Built for parallel processing
Data sets of < 100GB don’t gain benefits of Amazon Redshift
OLTP (Online Transaction Processing)
More appropriate for a traditional RDBMS or NoSQL database
Unstructured Data
Data must be structured by a defined schema
Amazon Redshift Spectrum
BLOB datastore large binary objects in Amazon S3 and reference in Amazon Redshift
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
DATA DESIGN CONSIDERATIONS
Are you using optimal data types?
Parquet , AVRO, ORC
Is your data distributed evenly?
Did you pick a good sort key?
Loading data efficiently
Use the COPY command
You need at least as many input files as you have slices
With multiple input files, all slices are working so you
maximize throughput
Scale linearly as you
add nodes
Distribution Key All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on
every node
Same key to same
location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin distribution
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Recently released features
Dense compute nodes (DC2)
2x performance as DC1 at the
same price
3x more I/O with
Upgrade at no cost
30% better storage utilization
than DC1
“Amazon Redshift’s new DC2 node is
giving us a 100% performance
increase, allowing us to provide
faster insights for our retailers, more
cost effectively, to drive incremental
revenue."
NVMe
SSD
DDR4 memory
Intel E5-2686
v4 (Broadwell)
Result-set caching
sub second repeat queries
• Amazon Redshift customers can now serve 35% more
queries on average, using the same compute
resources
• Tens of thousands of compute hours are freed up
daily to serve the remaining queries and data ingestion
• Transparent – it just works
“With Amazon Redshift result
caching, 20% of our queries now
complete in less than one second,”
said Greg Rokita, Executive Director
for Technology, Edmunds
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Short query acceleration
Express lane for short queries
• Machine learning predicts the
runtime of queries
• Short queries are routed to an
express queue
• Resources are dynamically
dedicated to short queries
• Enable it today from your
AWS Management Console
How it works:
Analytics and
BI/dashboard tools
Amazon
Redshift Machine learning classifier
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift elastic resize
Adds
additional
nodes
to Amazon
Redshift cluster
Distributes
data
across new
configuration
Minimal
transition time
Quickly scale
for varying
workload
demands
Scale up and down in minutes
Amazon
Redshift
cluster
Compute
nodes Amazon Redshift
managed Amazon S3
JDBC/ODBC
Leader node
Backup
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Concurrency scaling for bursts of user activity
Caching Layer
Creates
more
clusters
automatically
on-demand
Consistently
fast
performance
even with
thousands of
concurrent queries
No
advance
hydration
required
Free for >97% of
Customers
For every 24 hours
that your main
cluster is in use, you
accrue a one-hour
credit for
concurrency scaling
Backup
Amazon Redshift
managed Amazon S3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift spectrum
Amazon Redshift Spectrum
query engine
Query across Amazon
Redshift and Amazon S3
Amazon
Redshift data
Amazon S3
data lake
Extend the data warehouse to exabytes of data in Amazon S3 data lake
No data loading required
Scale compute and storage separately
Directly query data stored in Amazon S3
Parquet, ORC, Avro, JSON, and CSV data formats
 Unload to Parquet
 Spectrum Request Accelerator
Coming
Soon!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Unload
to Parquet
Amazon Redshift
New features
Speed
Scale
WLM
concurrency
setting
Simplicity
Amazon lake
formation
integration
Security
Auto data
distribution
Deferred
maintenance
Snapshot
scheduler
Amazon
Spectrum
request
accelerator
Auto data
distribution
Elastic
resize
Concurrency
scaling
Improving
short query
acceleration
Auto-
vacuum
Auto-
analyze
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
About Fannie Mae
Homes Financed by Fannie Mae
Home ownership in United States
36 %
64 %
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae: Data warehouse
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae: Challenges
Loan Credit
• Digital Transformation  3 million queries/month, 4x data growth
• Successful user adoption  User base growth 100 to 1,000 in three years
• Concurrency, scalability, and time to market
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Amazon Redshift solutions
DW
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae: On-premises to hybrid environment
On-premises Amazon Cloud
Amazon S3
Amazon
Redshift
Amazon
Athena
Amazon
EMR
Data
Warehouse
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae: Concurrency scaling performance
0
50
100
150
200
250
300
350
400
Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9 Query 10 Query 11
ExecutionTime(Sec.)
RS 16 nodes
RS 8 nodes Burst
Amazon Redshift 16 nodes vs. Amazon Redshift eight nodes w/ concurrency scaling
similar or better
performance is
achieved with
50% of the
compute
resource.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae: Concurrency scaling performance
With concurrency
scaling feature, the
performance was flat
(did not degrade) as
concurrency increases.
0
50
100
150
200
250
300
350
400
1
Query 3
30 50 100
ExecutionTime(Sec.)
Query Concurrencies
Amazon Redshift vs. Amazon Redshift Concurrency
Scaling
5-table joins
Average of RS 8 nodes Average of RS 8 nodes Burst
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Fannie Mae: Lessons learned
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R
S U M M I T
Dennis Magsajo
Solutions Architect
Worldwide Public Sector
AWS

More Related Content

What's hot

Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Amazon Web Services
 
Data modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS SummitData modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS SummitAmazon Web Services
 
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Amazon Web Services
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Amazon Web Services
 
Analyze customer sentiment using AI - AIM307 - New York AWS Summit
Analyze customer sentiment using AI - AIM307 - New York AWS SummitAnalyze customer sentiment using AI - AIM307 - New York AWS Summit
Analyze customer sentiment using AI - AIM307 - New York AWS SummitAmazon Web Services
 
Migration to AWS: The foundation for enterprise transformation - SVC210 - New...
Migration to AWS: The foundation for enterprise transformation - SVC210 - New...Migration to AWS: The foundation for enterprise transformation - SVC210 - New...
Migration to AWS: The foundation for enterprise transformation - SVC210 - New...Amazon Web Services
 
Introduzione a blockchain e registri digitali
Introduzione a blockchain e registri digitaliIntroduzione a blockchain e registri digitali
Introduzione a blockchain e registri digitaliAmazon Web Services
 
Working with Open Data in the Cloud
Working with Open Data in the CloudWorking with Open Data in the Cloud
Working with Open Data in the CloudAmazon Web Services
 
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...Amazon Web Services
 
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...Amazon Web Services
 
AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習
AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習
AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習Amazon Web Services
 
Alexa + IoT - SVC203 - New York AWS Summit
Alexa + IoT - SVC203 - New York AWS SummitAlexa + IoT - SVC203 - New York AWS Summit
Alexa + IoT - SVC203 - New York AWS SummitAmazon Web Services
 
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Amazon Web Services
 
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0Amazon Web Services
 
Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...
Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...
Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...Amazon Web Services
 
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...Amazon Web Services
 
The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...
The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...
The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...Amazon Web Services
 
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Amazon Web Services
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019AWS Summits
 

What's hot (20)

Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
 
Data modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS SummitData modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
Data modeling with Amazon DynamoDB - ADB301 - New York AWS Summit
 
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
Train once, deploy anywhere on the cloud and at the edge with Amazon SageMake...
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
 
Analyze customer sentiment using AI - AIM307 - New York AWS Summit
Analyze customer sentiment using AI - AIM307 - New York AWS SummitAnalyze customer sentiment using AI - AIM307 - New York AWS Summit
Analyze customer sentiment using AI - AIM307 - New York AWS Summit
 
Migration to AWS: The foundation for enterprise transformation - SVC210 - New...
Migration to AWS: The foundation for enterprise transformation - SVC210 - New...Migration to AWS: The foundation for enterprise transformation - SVC210 - New...
Migration to AWS: The foundation for enterprise transformation - SVC210 - New...
 
Introduzione a blockchain e registri digitali
Introduzione a blockchain e registri digitaliIntroduzione a blockchain e registri digitali
Introduzione a blockchain e registri digitali
 
Working with Open Data in the Cloud
Working with Open Data in the CloudWorking with Open Data in the Cloud
Working with Open Data in the Cloud
 
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
AWS App Mesh: Manage services mesh discovery, recovery, and monitoring - MAD3...
 
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
[NEW LAUNCH!] Introducti[NEW LAUNCH!] Introduction to event-driven architectu...
 
AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習
AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習
AWS及客戶在AI/ML的數位運行過程中得到的重要經驗與學習
 
Alexa + IoT - SVC203 - New York AWS Summit
Alexa + IoT - SVC203 - New York AWS SummitAlexa + IoT - SVC203 - New York AWS Summit
Alexa + IoT - SVC203 - New York AWS Summit
 
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
Developing-Effective-Mass-Migration-Strategy-out-of-a-Tool-based-Portfolio-As...
 
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
Move users to AWS with Amazon WorkSpaces and Amazon AppStream 2-0
 
Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...
Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...
Building home security solutions at scale, ft. Comcast - SVC206 - New York AW...
 
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
Favorire l'innovazione passando da applicazioni monolitiche ad architetture m...
 
The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...
The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...
The People Pillar of Cloud Adoption: Developing Your Workforce & Building Dig...
 
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
Frontend and Mobile with AWS Amplify | AWS Summit Tel Aviv 2019
 

Similar to Scale - Implementing a Data Warehouse on AWS

Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentAmazon Web Services
 
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyBest Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyAmazon Web Services
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSAmazon Web Services
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSDesign, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSAmazon Web Services
 
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...Amazon Web Services
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseAmazon Web Services
 
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...javier ramirez
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Migrating Business Critical Applications to AWS
Migrating Business Critical Applications to AWSMigrating Business Critical Applications to AWS
Migrating Business Critical Applications to AWSAmazon Web Services
 
Database su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivoDatabase su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivoAmazon Web Services
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019javier ramirez
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesAmazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Amazon Web Services
 
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAmazon Web Services
 
Migrating Data to the Cloud: Explore Your Options From AWS
Migrating Data to the Cloud: Explore Your Options From AWSMigrating Data to the Cloud: Explore Your Options From AWS
Migrating Data to the Cloud: Explore Your Options From AWSAmazon Web Services
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journeyAmazon Web Services
 
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-PremisesAmazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-PremisesAmazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
 

Similar to Scale - Implementing a Data Warehouse on AWS (20)

Implementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid EnvironmentImplementing a Data Warehouse on AWS in a Hybrid Environment
Implementing a Data Warehouse on AWS in a Hybrid Environment
 
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit SydneyBest Practices for Migrating Databases to the Cloud - AWS Summit Sydney
Best Practices for Migrating Databases to the Cloud - AWS Summit Sydney
 
Building Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWSBuilding Data Lakes for Analytics on AWS
Building Data Lakes for Analytics on AWS
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSDesign, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWS
 
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
 
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data WarehouseGPSWKS401_Designing a Cloud Enterprise Data Warehouse
GPSWKS401_Designing a Cloud Enterprise Data Warehouse
 
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Migrating Business Critical Applications to AWS
Migrating Business Critical Applications to AWSMigrating Business Critical Applications to AWS
Migrating Business Critical Applications to AWS
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
Database su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivoDatabase su AWS scegliere lo strumento giusto per il giusto obiettivo
Database su AWS scegliere lo strumento giusto per il giusto obiettivo
 
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
Building a Modern Data Platform on AWS. Public Sector Summit Brussels 2019
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML Architectures
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
 
Migrating Data to the Cloud: Explore Your Options From AWS
Migrating Data to the Cloud: Explore Your Options From AWSMigrating Data to the Cloud: Explore Your Options From AWS
Migrating Data to the Cloud: Explore Your Options From AWS
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journey
 
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-PremisesAmazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
Amazon Relational Database (RDS) on VMware: Running Amazon RDS On-Premises
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Scale - Implementing a Data Warehouse on AWS

  • 1. P U B L I C S E C T O R S U M M I T SINGAPORE
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Implementing a Data Warehouse on AWS in a Hybrid Environment Dennis Magsajo Solutions Architect Worldwide Public Sector AWS
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Key Takeaways Why do enterprises want hybrid cloud? Data warehouse on AWS Data warehouse design considerations Customer story: Fannie Mae
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Run workloads on the cloud Tight integration Run workloads on-premises Without buying new hardware What do customers want in hybrid?
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Integration of different data silos Integrated identity and access Integrated resources and deployment management Integrated devices and edge systems Cloud bursting Data center extension Hybrid cloud use cases
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T The explosion of data that is being generated by cloud-based applications and services, as well as data that is being migrated to cloud platforms from on-premises systems, is increasing exponentially. THE DATA TSUNAMI Data every five years There is more data than people think. years live for Data platforms need to scalegrows
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T ANALYTICS PIPELINE: AWS CONSUME/VISUALIZEETL ANALYZECOLLECT STORE AWS Glue ETL & Data Catalog AWS Lake Formation Data lakes Amazon Redshift Data warehousing Amazon EMR Hadoop + Spark Amazon Athena Interactive analytics Amazon Kinesis Data Analytics Real- time Amazon ES Operational Analytics AWS Database Migration Service AWS Snowball AWS Snowmobile Amazon Kinesis Data Firehose Amazon Kinesis Data Streams AWS Direct Connect AWS DataSync Amazon S3/Amazon S3 Glacier Amazon RDS Amazon Aurora MySQL, PostgreSQL Amazon ElastiCache Redis, Memcached Amazon Quantum Ledger Database Amazon DynamoDB Key value Amazon Neptune Graph Amazon Timestream Time series Amazon RDS on VMWare Amazon DocumentDB Amazon EFS Amazon FSx Amazon QuickSight Amazon SageMaker Amazon Machine Learning (Amazon ML) AWS Marketplace
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T What does data warehouse modernization mean? Easy to use Extends to your data lake Don’t waste time on menial administrative tasks and maintenance Directly analyze data stored in your data lake in open formats Any scale of data, workloads, and users Dynamically scale up to guarantee performance even with unpredictable demands and data volumes Faster time-to-insights Consistently fast performance, even with thousands of concurrent queries and users
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Data warehouse on AWS
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T DESIGNING A CLOUD DATA WAREHOUSE Prepare your sources: Batch or real-time? Ingest: How are you going to get data into your data warehouse? ETL: Is the data structured for the data warehouse or do you need to? Data quality What do you do about the quality of your data? Partner solutions available Auditing Data governance Master data management Nightly jobs/ETL Managing your data transformation
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T ENTERPRISE DATA WAREHOUSE WORKFLOW ETLSTORE ANALYZE VISUALIZECOLLECT Amazon S3 Amazon Redshift Amazon S3 Amazon QuickSight Amazon EMR AWS Glue Amazon Redshift Spectrum Unstructured data Structured data AWS Direct Connect AWS Snowball AWS Database Migration Service AWS DataSync Data Sources On-premises Data Sources In Cloud
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AMAZON REDSHIFT ETLSTORE ANALYZE VISUALIZECOLLECT Fastest Get faster time-to-insight for all types of analytics workloads; powered by machine learning, columnar storage, and MPP Unlimited scale Extends your Data Lake 1/10th the cost Dynamically scale up to guarantee performance even with unpredictable analytical demands and data volumes Analyze data in the Amazon S3 data lake in-place and in open formats, together with data loaded into Amazon Redshift’s high performance SSDs Start at $0.25 per hour, save costs with automated administration tasks and eliminate business impact due to downtime; as low as $1,000 per terabyte per year Fast, simple, cost-effective data warehouse that can extend queries to your data lake Analyze data in open formats such as Parquet, ORC, and JSON, using SQL tools
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift The four things that matter most Speed Scale SecuritySimplicity
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AMAZON REDSHIFT SYSTEM ARCHITECTURE Leader node SQL endpoint Stores metadata Coordinates query execution Compute nodes Local, columnar storage Execute queries in parallel Load, backup, restore through Amazon S3; load from Amazon DynamoDB, Amazon EMR Two hardware platforms Optimized for data processing DS2: HDD; scale from 2TB to 2PB DC1: SSD; scale from 160GB to 356TB Ingestion/Backup/Restore JDBC/ODBC SQL Clients / BI Tools Data Catalog Leader Node Compute Nodes 10 GigE (HPC)
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T A DEEPER LOOK AT COMPUTE NODE ARCHITECTURE • A compute node is partitioned into either two or 16 slices; a slice can be thought of as a “virtual compute node” • Each slice is allocated a portion of the compute node's memory and disk space, where it processes a portion of the workload assigned to the compute node by the leader node • The leader node manages distributing data to the slices and apportions the workload for any queries or other database operations to the slices • Slices are Amazon Redshift’s symmetric multi processing (SMP) mechanism – they work in parallel to complete operations Compute Node
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Security is built-in Compliance certifications 10 GigE (HPC) Customer VPC Internal VPC JDBC/ODBC Compute Nodes Leader Node Network Isolation End-to-end encryption Integration with AWS Key Management Service (AWS KMS) Amazon S3
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T AMAZON REDSHIFT ANTI-PATTERNS Amazon Redshift is not ideally suited for the following usage patterns: Small Datasets Built for parallel processing Data sets of < 100GB don’t gain benefits of Amazon Redshift OLTP (Online Transaction Processing) More appropriate for a traditional RDBMS or NoSQL database Unstructured Data Data must be structured by a defined schema Amazon Redshift Spectrum BLOB datastore large binary objects in Amazon S3 and reference in Amazon Redshift
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T DATA DESIGN CONSIDERATIONS Are you using optimal data types? Parquet , AVRO, ORC Is your data distributed evenly? Did you pick a good sort key? Loading data efficiently Use the COPY command You need at least as many input files as you have slices With multiple input files, all slices are working so you maximize throughput Scale linearly as you add nodes Distribution Key All Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 All data on every node Same key to same location Node 1 Slice 1 Slice 2 Node 2 Slice 3 Slice 4 Even Round robin distribution
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Recently released features Dense compute nodes (DC2) 2x performance as DC1 at the same price 3x more I/O with Upgrade at no cost 30% better storage utilization than DC1 “Amazon Redshift’s new DC2 node is giving us a 100% performance increase, allowing us to provide faster insights for our retailers, more cost effectively, to drive incremental revenue." NVMe SSD DDR4 memory Intel E5-2686 v4 (Broadwell) Result-set caching sub second repeat queries • Amazon Redshift customers can now serve 35% more queries on average, using the same compute resources • Tens of thousands of compute hours are freed up daily to serve the remaining queries and data ingestion • Transparent – it just works “With Amazon Redshift result caching, 20% of our queries now complete in less than one second,” said Greg Rokita, Executive Director for Technology, Edmunds
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Short query acceleration Express lane for short queries • Machine learning predicts the runtime of queries • Short queries are routed to an express queue • Resources are dynamically dedicated to short queries • Enable it today from your AWS Management Console How it works: Analytics and BI/dashboard tools Amazon Redshift Machine learning classifier
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift elastic resize Adds additional nodes to Amazon Redshift cluster Distributes data across new configuration Minimal transition time Quickly scale for varying workload demands Scale up and down in minutes Amazon Redshift cluster Compute nodes Amazon Redshift managed Amazon S3 JDBC/ODBC Leader node Backup
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Concurrency scaling for bursts of user activity Caching Layer Creates more clusters automatically on-demand Consistently fast performance even with thousands of concurrent queries No advance hydration required Free for >97% of Customers For every 24 hours that your main cluster is in use, you accrue a one-hour credit for concurrency scaling Backup Amazon Redshift managed Amazon S3
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift spectrum Amazon Redshift Spectrum query engine Query across Amazon Redshift and Amazon S3 Amazon Redshift data Amazon S3 data lake Extend the data warehouse to exabytes of data in Amazon S3 data lake No data loading required Scale compute and storage separately Directly query data stored in Amazon S3 Parquet, ORC, Avro, JSON, and CSV data formats  Unload to Parquet  Spectrum Request Accelerator Coming Soon!
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Unload to Parquet Amazon Redshift New features Speed Scale WLM concurrency setting Simplicity Amazon lake formation integration Security Auto data distribution Deferred maintenance Snapshot scheduler Amazon Spectrum request accelerator Auto data distribution Elastic resize Concurrency scaling Improving short query acceleration Auto- vacuum Auto- analyze
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T About Fannie Mae Homes Financed by Fannie Mae Home ownership in United States 36 % 64 %
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae: Data warehouse
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae: Challenges Loan Credit • Digital Transformation  3 million queries/month, 4x data growth • Successful user adoption  User base growth 100 to 1,000 in three years • Concurrency, scalability, and time to market
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Amazon Redshift solutions DW
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae: On-premises to hybrid environment On-premises Amazon Cloud Amazon S3 Amazon Redshift Amazon Athena Amazon EMR Data Warehouse
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae: Concurrency scaling performance 0 50 100 150 200 250 300 350 400 Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9 Query 10 Query 11 ExecutionTime(Sec.) RS 16 nodes RS 8 nodes Burst Amazon Redshift 16 nodes vs. Amazon Redshift eight nodes w/ concurrency scaling similar or better performance is achieved with 50% of the compute resource.
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae: Concurrency scaling performance With concurrency scaling feature, the performance was flat (did not degrade) as concurrency increases. 0 50 100 150 200 250 300 350 400 1 Query 3 30 50 100 ExecutionTime(Sec.) Query Concurrencies Amazon Redshift vs. Amazon Redshift Concurrency Scaling 5-table joins Average of RS 8 nodes Average of RS 8 nodes Burst
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Fannie Mae: Lessons learned
  • 36. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C T O R S U M M I T Dennis Magsajo Solutions Architect Worldwide Public Sector AWS