SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
November 30, 2016
Migrating Your Data Warehouse to Amazon Redshift
DAT202
Pavan Pothukuchi, Sr. Manager PM, Amazon Redshift
Ali Khan, Director of BI and Analytics, Scholastic
Laxmikanth Malladi, Principal Architect, Northbay Solutions
“It’s our biggest driver of growth in our biggest markets, and is a feature of the
company” …on Data Mining in Redshift
– Chris Lambert, Lyft CTO
“The doors were blown wide open to create custom dashboards for anyone to
instantly go in and see and assess what is going in our ad delivery landscape,
something we have never been able to do until now.”
– Bryan Blair, Vevo’s VP of Ad Operations
“Analytical queries are 10 times faster in Amazon Redshift than they
were with our previous data warehouse.”
– Yuki Moritani, NTT Docomo Innovation Manager
“We have several petabytes of data and use a massive Redshift
cluster. Our data science team can get to the data faster and then
analyze that data to find new ways to reduce costs, market
products, and enable new business.”
– Yuki Moritani, NTT Docomo Innovation Manager
“We saw a 2x performance improvement on a wide variety of
workloads. The more complex the queries, the higher the
performance improvement..”
- Naeem Ali, Director of Software Development, Data
Science at Cablevision (Optimum)
“Over the last few years, we’ve tried all kinds of databases in search of more
speed, including $15k of custom hardware. Of everything we’ve tried,
Amazon Redshift won out each time.”
– Periscope Data, Analyst’s Guide to Redshift
“We took Amazon Redshift for a test run the moment it was
released. It’s fast. It’s easy. Did I mention it’s ridiculously fast?
We’re using it to provide our analysts an alternative to Hadoop.”
– Justin Yan, Data Scientist at Yelp
“The move to Redshift also significantly improved dashboard query
performance… Redshift performed ~200% faster than the
traditional SQL Server we had been using in the past.”
- Dean Donovan, Product Development at DiamondStream
“…[Redshift] performance has blown away everyone here (we
generally see 50-100x speedup over Hive)”
- Jie Li Data Infrastructure at Pinterest
“450,000 online queries 98 percent faster than previous traditional data
center, while reducing infrastructure costs by 80 percent.”
- John O’Donovan, CTO, Financial Times
“We needed to load six months' worth of data, about 10 TB of data, for a
campaign. That type of load would have taken about 20 days with our previous
solution. By using Amazon Redshift, it only took six hours to load the data.”
- Zhong Hong, VP of Infrastructure, Vivaki (Publicis Groupe)
“We regularly process multibillion row datasets and we do that in a
matter of hours. We are heading to up to 10 times more data volumes in
the next couple of years, easily.”
- Bob Harris, CTO, Channel 4
“On our previous big data warehouse system, it took around 45
minutes to run a query against a year of data, but that number went
down to just 25 seconds using Amazon Redshift”
- Kishore Raja Director of Strategic Programs and R&D, Boingo Wireless
“Most competing data warehousing solutions would have cost us up
to $1 million a year. By contrast, Amazon Redshift costs us just
$100,000 all-in, representing a total cost savings of around 90%”
- Joel Cumming, Head of Data, Kik Interactive
“Annual costs of Redshift are equivalent to just the annual
maintenance of some of the cheaper on-premises options for
data warehouses..”
- Kevin Diamond, CTO, HauteLook (Nordstrom)
“Our data volume keeps growing, and we can support that
growth because Amazon Redshift scales so well.. We wouldn’t
have that capability using the supporting on-premises hardware in
our previous solution.”
- Ajit Zadgaonkar, Director of Ops. and Infrastructure, Edmunds
“With Amazon Redshift and Tableau, anyone in the company can set up
any queries they like - from how users are reacting to a feature, to growth by
demographic or geography, to the impact sales efforts had in different areas”
- Jon Hoffman, Head of Engineering, Foursquare
Today’s agenda
• Amazon Redshift Overview
• Use cases and benefits
• Migration options
• Scholastic’s use case
• Architecture details
• Technical overview
• Key project learnings
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical
representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any
vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.
Forrester Wave™ Enterprise Data Warehouse Q4 ’15
Selected Amazon Redshift customers
Why migrate to Amazon Redshift?
100x faster
Scales from GBs to PBs
Analyze data without storage
constraints
10x cheaper
Easy to provision and operate
Higher productivity
10x faster
No programming
Standard interfaces and
integration to leverage BI tools,
machine learning, streaming
Transactional database MPP database Hadoop
Migration from Oracle @ Boingo Wireless
2000+ Commercial Wi-Fi locations
1 million+ Hotspots
90M+ ad engagements
100+ countries
Legacy DW: Oracle 11g based DW
Before migration
Rapid data growth slowed
analytics
Mediocre IOPS, limited memory,
vertical scaling
Admin overhead
Expensive (license, h/w, support)
After migration
180x performance improvement
7x cost savings
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
Exadata SAP
HANA
Redshift
$400,000
$300,000
$55,000
7,200
2,700
15 15
Query
Performance
Data Load
Performance
1 year of data
1 million records
Latencyinseconds
RedshiftExisting System
7X cheaper than Oracle Exadata 180X faster than Oracle database
Migration from Oracle @ Boingo Wireless
Migration from Greenplum @ NTT Docomo
68 million customers
10s of TBs per day of data across
mobile network
6PB of total data (uncompressed)
Data science for marketing
operations, logistics etc.
Legacy DW: Greenplum on-premises
After migration:
125 node DS2.8XL cluster
4,500 vCPUs, 30TB RAM
6 PB uncompressed
10x faster analytic queries
50% reduction in time for new BI
app. deployment
Significantly less ops. overhead
Migration from SQL on Hadoop @ Yahoo
Analytics for website/mobile events
across multiple Yahoo properties
On an average day
2B events
25M devices
Before migration: Hive – Found it to be
slow, hard to use, share and repeat
After migration:
21 node DC1.8XL (SSD)
50TB compressed data
100x performance improvement
Real-time insights
Easier deployment and
maintenance
Migration from SQL on Hadoop @ Yahoo
1
10
100
1000
10000
Count
Distinct
Devices
Count All
Events
Filter
Clauses
Joins
Seconds
Amazon Redshift
Impala
Business Value and Productivity
Business Productivity Benefits
Analyze more data
Faster time to market
Get better insights
Match capacity with demand
ENGINE X Amazon Redshift
ETL Scripts
SQL in reports
Adhoc. queries
How to Migrate?
Schema Conversion Database Migration
Map data types
Choose compression
encoding, sort keys,
distribution keys
Generate and apply DDL
Schema & Data
Transformation
Data Migration
Convert SQL Code
Bulk Load
Capture updates
Transformations
Assess Gaps
Stored Procedures
Functions
1 2
3
4
Convert schema in a few clicks
Sources include Oracle, Teradata,
Greenplum and Netezza
Automatic schema optimization
Converts application SQL code
Detailed assessment report
AWS Schema
Conversion Tool
(AWS SCT)
AWS Schema Conversion Tool
Start your first migration in few minutes
Sources include: Aurora, Oracle, SQL
Server, MySQL and PostgreSQL
Bulk load and continuous replication
Migrate a TB for $3
Fault tolerant
(AWS DMS)
AWS DMS: Change data capture
Replication instance
Source Target
Update
t1 t2
t1
t2
Transactions Change
apply
after bulk
load
Data integration partners
Data Integration Systems Integrators
Amazon Redshift
Beyond Amazon Redshift…
Scholastic, Established 1920
Where were we?
Platform
13+ years old. IBM AS/400 DB2 and Microsoft SQL Server are the primary data
warehouse platforms. BI Platform is primarily Microsoft (SSRS, SSAS, Excel, SharePoint)
500+ direct users across every LOB and business function
20+ TB. 5,500+ DB2 workloads, 350+ SQL Server workloads, 15 SSAS cubes, 150+
SSRS reports
Challenges
Inflexible, multi-layered architecture – slow time to market
Inability to meet internal SLAs due to performance of daily ETL processes
Scalability limitations with SQL Server Analysis Services (SSAS) for reports
Limited ability to perform self-service Business Intelligence
21
Moving forward: Key decision factors
• Improved performance, scalability, availability,
logging, security
• Enablement of self service business intelligence
• Leverage the skill set of current team (Relational DB
& SQL)
• Integration with existing technology stack
• Alignment with the tech strategy (devops model,
Cloud First)
• Ability to support Big Data initiatives
• Team up with an experienced consulting partner
22
Why we chose AWS and Amazon Redshift
AWS was chosen for its agility, scalability, elasticity, and
security
Redshift
• Scalable, fast
• Managed service, cost-optimization models,
elastic
• SQL/relational matched skillset of team
S3 was chosen as location for ingestion process
NorthBay was chosen as the implementation partner for
their expertise in Big Data and Redshift migrations
23
How the project unfolded
Goals
• 3-month pilot to migrate a Functional area in key LOB
• Demonstrate immediate business value
• Use AWS Stack & Open Source for Data Movement from DB2
(No CDC/ETL tool)
Outcomes
• Core Framework for Migration
• ELT Architecture and Validation
• Visualization/Self-service capability through Tableau
EMR Cluster running
Sqoop Script
Output Bucket EC2 Instance running
Copy Command
Redshift
(Staging)
Data Pipeline
SNS Topic
(Pipeline Status) (Pipeline Failure)
SNS Email Notification
Lambda
(Save Pipeline Stats)
RDS MySQL Instance
(Pipeline
Configurations)
DynamoDB
Redshift
(Enterprise Data
Repository)
AS400 / DB2
(Staging)
SQL Server EDW
Tableau
(Reporting Tool)
Source
DBs
SSAS CubesSSRS Reports
Scholastic data cloud: Technical architecture
Core Framework
• Jobs and Job Groups are defined as metadata in DynamoDB
• Control-M scheduler, Custom Application and Data Pipeline for
Orchestration
• ELT Process with EMR/Sqoop for Extraction. Load and Transform
the data through Redshift SQL scripts
• Core Framework enables
• Restart capability from point of failure
• Capturing of operational statistics (# of rows updated, etc.)
• Audit capability (which feed caused the Fact to change, etc.)
26
Extract
• Pre-create EMR resources at the start of Batch
• Achieve parallelism in Sqoop with mappers and Fair Scheduling
• Sqoop query to add additional fields like Batch_id, Updated_date etc
• Data extracts are split and compressed for optimized loading into Redshift
27
AS400 / DB2
EMR with Sqoop
S3
Metadata
KMS
Data Pipeline
1
2
3
4
5 6
Control Flow
Data Flow
Load
• Truncate and Load through Data Pipeline for Staging tables
• Dynamic Work Load Management (WLM) queues setup to allow maximum
resources during Loading/Transformation
• Check and terminate any locks on tables to allow truncation
• Capture metrics related to number of rows loaded, time taken, etc.28
StagingS3
KMS
Data Pipeline
4
1 2
3
EC2 Control Flow
Data Flow
Transform
• Custom Application for building Dimensions and Facts
• SQL Scripts are stored in S3 and executed by ELT process
• SQL scripts refactored from SQL Server and AS400 scripts
• Non-Functional Requirements are achieved through Custom App
29
1
3
2
4
5
6
7a
7b
S3
Staging
Facts
Metadata
Dimensions
App
Control Flow
Data Flow
Schema Design
• Modified Star Schema
• Natural Keys instead of generating unique identifiers
• Commonly used columns from Dimensions are copied over to
Facts
• Surrogate keys are eliminated except for few cases
• Compression
• Define appropriate Distribution and Sort Keys
• Define primary key and Foreign keys
Security
• AWS Key Management Service (KMS) is used for encrypting
access credentials to Source and Target databases
• Jenkins job to allow encrypting of credentials using KMS
directly by Database Administrators
• Amazon EMR, Jenkins resources are given KMS decrypt
permissions to allow connecting to Sources and Targets during
the ELT process
• Standard Security in Transit and at Rest throughout the process
• IAM federation through Enterprise Active Directory
31
Reporting
• Business users access to Facts/Dimensions through Tableau
• Power users access to Staging tables through Tableau
• Enable Data Analysts access to files in S3 using Hive/Presto
• Self-Service capability across business users
32
S3 Staging Facts/ Dimensions
Business
Analysts
Power
Users
Data
Analysts
EMR
Presto/Hive
Workstream Effort
• Define Jobs and Job Groups specific to each
Workstream
• Create Redshift tables (Staging, Facts, Dimensions)
based on mapping from AS400 and best practices
learned
• Create new SQL scripts (based on the logic from
AS400/SQL Server code) for transformation
• Develop, Test and Deploy in 2-week Agile sprints
33
Key Lessons - Technical
• Isolate core framework with project specific code repositories
• Consolidating logging solution across Amazon S3, Amazon
Redshift, Amazon DynamoDB etc., was a challenge
• Make appropriate schema changes when migrating to new
platform
• Custom Framework for gathering operational stats (eg: # of
rows loaded etc.)
• Start with Test Automation tools and Acceptance Test Driven
Development (ATDD) earlier in the project
34
Project timeline revisited
After the successful pilot:
• Executive Leadership accelerated timeline:
• Reduce project timeline by 50% (to 12 months) to
deliver value faster to LOBs
• Realize cost savings by eliminating the DB2 and
SQL Server platforms earlier
• Users wanted to be on the new platform!
• Scholastic & NorthBay partnered to create a
training curriculum to ensure a supply of skilled
staff would be available to our teams
35
Scaling up: 7 workstreams
• Developed a model for estimating effort and cost
(AWS costs & Labor per LOB migration)
• Running agile teams in parallel – employed Agile
coaches
• Enhanced the core framework to ensure it would
scale effectively when in use by multiple teams
simultaneously
• Building a Code repository for use by all teams
• Building CI / CD Frameworks
Where are we now?
• 4 of 7 LOBs migrated – framework enables complete migration of a
functional area within days/weeks as opposed to months. On track to
migrate and decommission entire legacy environment within next 6
months
• 10 weeks to migrate from an external vendor hosting data and providing
reports for one LoB
• Cost of Data Ingestion Framework is under $40/day (EC2, EMR, Data
Pipeline)
• First “Big Data” initiative in production, captures and processes an
average of 1.5 Million e reading events daily (peak: 7 Million)
• Profile: LOB #1
• Loading ~5-6 Million rows/day (6-7GB/day)
• Processing over 1.5 billion rows within Redshift daily
• Complete ETL/ELT batch cycle performance improved by over 170%
Key lessons – project execution
• Essential to monitor and optimize AWS costs
• “Data Champion” / “Data Guide” partnership absolutely critical for
successful adoption of new platforms
• Importance of strong Agile coaches while scaling out Agile teams
• Criticality of choosing consulting partners (AWS & North Bay)
who can ramp up and supply key resources fast and cycle off the
project when finished
• Creating new data platforms and migrating data into them is
easy, especially with AWS. Decommission of existing data
platforms is hard!
38
Thank you!
Remember to complete
your evaluations!
Related Sessions
Hear from other customers discussing their Amazon Redshift use cases:
• BDM402—Best Practices for Data Warehousing with Amazon Redshift (King.com)
• BDA304—What’s New with Amazon Redshift
• SVR308—Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year
• GAM301—How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful
Player Insights
• BDA207—Fanatics: Deploying Scalable, Self-Service Business Intelligence on AWS
• BDM306— Netflix: Using Amazon S3 as the fabric of our big data ecosystem
• BDA203 — Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift
(GE Power and Water)
• BDM206 — Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT
Analytics Platform on AWS (Hello)
• STG307— Case Study: How Prezi Built and Scales a Cost-Effective, Multipetabyte Data Platform
and Storage Infrastructure on Amazon S3

More Related Content

What's hot

Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with Databricks
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
Amazon Web Services
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Amazon Web Services
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
DATAVERSITY
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama
 

What's hot (20)

Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 

Viewers also liked

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
Amazon Web Services
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Amazon Web Services
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
Amazon Web Services
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
Amazon Web Services
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Amazon Web Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
Amazon Web Services
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and consSaniya Khalsa
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Amazon Web Services
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
Amazon Web Services
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
Amazon Web Services
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage Options
Amazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
Amazon Web Services
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Amazon Web Services
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Amazon Web Services
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon Web Services
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
Amazon Web Services
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
Amazon Web Services
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
Amazon Web Services
 

Viewers also liked (20)

(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
Announcing AWS Snowball Edge and AWS Snowmobile - December 2016 Monthly Webin...
 
(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive(STG402) Amazon EBS Deep Dive
(STG402) Amazon EBS Deep Dive
 
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS
 
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
AWS re:Invent 2016: Deep Dive on Amazon DynamoDB (DAT304)
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 
Dynamo db pros and cons
Dynamo db  pros and consDynamo db  pros and cons
Dynamo db pros and cons
 
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
Best Practices for Managing Security Operations in AWS - March 2017 AWS Onlin...
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
Understanding AWS Storage Options
Understanding AWS Storage OptionsUnderstanding AWS Storage Options
Understanding AWS Storage Options
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon EC2 Systems Manager for Hybrid Cloud Management at Scale
Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale
 
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 

Similar to AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
Amazon Web Services
 
Amazon Redshift (February 2016)
Amazon Redshift (February 2016)Amazon Redshift (February 2016)
Amazon Redshift (February 2016)
Julien SIMON
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Julien SIMON
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Web Services
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
RTTS
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
Amazon Web Services
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
Amazon Web Services
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
Amazon Web Services
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
IBM
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
Attunity
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
SnapLogic
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 

Similar to AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202) (20)

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
Amazon Redshift (February 2016)
Amazon Redshift (February 2016)Amazon Redshift (February 2016)
Amazon Redshift (February 2016)
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 30, 2016 Migrating Your Data Warehouse to Amazon Redshift DAT202 Pavan Pothukuchi, Sr. Manager PM, Amazon Redshift Ali Khan, Director of BI and Analytics, Scholastic Laxmikanth Malladi, Principal Architect, Northbay Solutions “It’s our biggest driver of growth in our biggest markets, and is a feature of the company” …on Data Mining in Redshift – Chris Lambert, Lyft CTO “The doors were blown wide open to create custom dashboards for anyone to instantly go in and see and assess what is going in our ad delivery landscape, something we have never been able to do until now.” – Bryan Blair, Vevo’s VP of Ad Operations “Analytical queries are 10 times faster in Amazon Redshift than they were with our previous data warehouse.” – Yuki Moritani, NTT Docomo Innovation Manager “We have several petabytes of data and use a massive Redshift cluster. Our data science team can get to the data faster and then analyze that data to find new ways to reduce costs, market products, and enable new business.” – Yuki Moritani, NTT Docomo Innovation Manager “We saw a 2x performance improvement on a wide variety of workloads. The more complex the queries, the higher the performance improvement..” - Naeem Ali, Director of Software Development, Data Science at Cablevision (Optimum) “Over the last few years, we’ve tried all kinds of databases in search of more speed, including $15k of custom hardware. Of everything we’ve tried, Amazon Redshift won out each time.” – Periscope Data, Analyst’s Guide to Redshift “We took Amazon Redshift for a test run the moment it was released. It’s fast. It’s easy. Did I mention it’s ridiculously fast? We’re using it to provide our analysts an alternative to Hadoop.” – Justin Yan, Data Scientist at Yelp “The move to Redshift also significantly improved dashboard query performance… Redshift performed ~200% faster than the traditional SQL Server we had been using in the past.” - Dean Donovan, Product Development at DiamondStream “…[Redshift] performance has blown away everyone here (we generally see 50-100x speedup over Hive)” - Jie Li Data Infrastructure at Pinterest “450,000 online queries 98 percent faster than previous traditional data center, while reducing infrastructure costs by 80 percent.” - John O’Donovan, CTO, Financial Times “We needed to load six months' worth of data, about 10 TB of data, for a campaign. That type of load would have taken about 20 days with our previous solution. By using Amazon Redshift, it only took six hours to load the data.” - Zhong Hong, VP of Infrastructure, Vivaki (Publicis Groupe) “We regularly process multibillion row datasets and we do that in a matter of hours. We are heading to up to 10 times more data volumes in the next couple of years, easily.” - Bob Harris, CTO, Channel 4 “On our previous big data warehouse system, it took around 45 minutes to run a query against a year of data, but that number went down to just 25 seconds using Amazon Redshift” - Kishore Raja Director of Strategic Programs and R&D, Boingo Wireless “Most competing data warehousing solutions would have cost us up to $1 million a year. By contrast, Amazon Redshift costs us just $100,000 all-in, representing a total cost savings of around 90%” - Joel Cumming, Head of Data, Kik Interactive “Annual costs of Redshift are equivalent to just the annual maintenance of some of the cheaper on-premises options for data warehouses..” - Kevin Diamond, CTO, HauteLook (Nordstrom) “Our data volume keeps growing, and we can support that growth because Amazon Redshift scales so well.. We wouldn’t have that capability using the supporting on-premises hardware in our previous solution.” - Ajit Zadgaonkar, Director of Ops. and Infrastructure, Edmunds “With Amazon Redshift and Tableau, anyone in the company can set up any queries they like - from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts had in different areas” - Jon Hoffman, Head of Engineering, Foursquare
  • 2. Today’s agenda • Amazon Redshift Overview • Use cases and benefits • Migration options • Scholastic’s use case • Architecture details • Technical overview • Key project learnings
  • 3. Relational data warehouse Massively parallel; petabyte scale Fully managed HDD and SSD platforms $1,000/TB/year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 4. The Forrester Wave™ is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave™ are trademarks of Forrester Research, Inc. The Forrester Wave™ is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change. Forrester Wave™ Enterprise Data Warehouse Q4 ’15
  • 6. Why migrate to Amazon Redshift? 100x faster Scales from GBs to PBs Analyze data without storage constraints 10x cheaper Easy to provision and operate Higher productivity 10x faster No programming Standard interfaces and integration to leverage BI tools, machine learning, streaming Transactional database MPP database Hadoop
  • 7. Migration from Oracle @ Boingo Wireless 2000+ Commercial Wi-Fi locations 1 million+ Hotspots 90M+ ad engagements 100+ countries Legacy DW: Oracle 11g based DW Before migration Rapid data growth slowed analytics Mediocre IOPS, limited memory, vertical scaling Admin overhead Expensive (license, h/w, support) After migration 180x performance improvement 7x cost savings
  • 8. 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 Exadata SAP HANA Redshift $400,000 $300,000 $55,000 7,200 2,700 15 15 Query Performance Data Load Performance 1 year of data 1 million records Latencyinseconds RedshiftExisting System 7X cheaper than Oracle Exadata 180X faster than Oracle database Migration from Oracle @ Boingo Wireless
  • 9. Migration from Greenplum @ NTT Docomo 68 million customers 10s of TBs per day of data across mobile network 6PB of total data (uncompressed) Data science for marketing operations, logistics etc. Legacy DW: Greenplum on-premises After migration: 125 node DS2.8XL cluster 4,500 vCPUs, 30TB RAM 6 PB uncompressed 10x faster analytic queries 50% reduction in time for new BI app. deployment Significantly less ops. overhead
  • 10. Migration from SQL on Hadoop @ Yahoo Analytics for website/mobile events across multiple Yahoo properties On an average day 2B events 25M devices Before migration: Hive – Found it to be slow, hard to use, share and repeat After migration: 21 node DC1.8XL (SSD) 50TB compressed data 100x performance improvement Real-time insights Easier deployment and maintenance
  • 11. Migration from SQL on Hadoop @ Yahoo 1 10 100 1000 10000 Count Distinct Devices Count All Events Filter Clauses Joins Seconds Amazon Redshift Impala
  • 12. Business Value and Productivity Business Productivity Benefits Analyze more data Faster time to market Get better insights Match capacity with demand
  • 13. ENGINE X Amazon Redshift ETL Scripts SQL in reports Adhoc. queries How to Migrate? Schema Conversion Database Migration Map data types Choose compression encoding, sort keys, distribution keys Generate and apply DDL Schema & Data Transformation Data Migration Convert SQL Code Bulk Load Capture updates Transformations Assess Gaps Stored Procedures Functions 1 2 3 4
  • 14. Convert schema in a few clicks Sources include Oracle, Teradata, Greenplum and Netezza Automatic schema optimization Converts application SQL code Detailed assessment report AWS Schema Conversion Tool (AWS SCT)
  • 16. Start your first migration in few minutes Sources include: Aurora, Oracle, SQL Server, MySQL and PostgreSQL Bulk load and continuous replication Migrate a TB for $3 Fault tolerant (AWS DMS)
  • 17. AWS DMS: Change data capture Replication instance Source Target Update t1 t2 t1 t2 Transactions Change apply after bulk load
  • 18. Data integration partners Data Integration Systems Integrators Amazon Redshift
  • 21. Where were we? Platform 13+ years old. IBM AS/400 DB2 and Microsoft SQL Server are the primary data warehouse platforms. BI Platform is primarily Microsoft (SSRS, SSAS, Excel, SharePoint) 500+ direct users across every LOB and business function 20+ TB. 5,500+ DB2 workloads, 350+ SQL Server workloads, 15 SSAS cubes, 150+ SSRS reports Challenges Inflexible, multi-layered architecture – slow time to market Inability to meet internal SLAs due to performance of daily ETL processes Scalability limitations with SQL Server Analysis Services (SSAS) for reports Limited ability to perform self-service Business Intelligence 21
  • 22. Moving forward: Key decision factors • Improved performance, scalability, availability, logging, security • Enablement of self service business intelligence • Leverage the skill set of current team (Relational DB & SQL) • Integration with existing technology stack • Alignment with the tech strategy (devops model, Cloud First) • Ability to support Big Data initiatives • Team up with an experienced consulting partner 22
  • 23. Why we chose AWS and Amazon Redshift AWS was chosen for its agility, scalability, elasticity, and security Redshift • Scalable, fast • Managed service, cost-optimization models, elastic • SQL/relational matched skillset of team S3 was chosen as location for ingestion process NorthBay was chosen as the implementation partner for their expertise in Big Data and Redshift migrations 23
  • 24. How the project unfolded Goals • 3-month pilot to migrate a Functional area in key LOB • Demonstrate immediate business value • Use AWS Stack & Open Source for Data Movement from DB2 (No CDC/ETL tool) Outcomes • Core Framework for Migration • ELT Architecture and Validation • Visualization/Self-service capability through Tableau
  • 25. EMR Cluster running Sqoop Script Output Bucket EC2 Instance running Copy Command Redshift (Staging) Data Pipeline SNS Topic (Pipeline Status) (Pipeline Failure) SNS Email Notification Lambda (Save Pipeline Stats) RDS MySQL Instance (Pipeline Configurations) DynamoDB Redshift (Enterprise Data Repository) AS400 / DB2 (Staging) SQL Server EDW Tableau (Reporting Tool) Source DBs SSAS CubesSSRS Reports Scholastic data cloud: Technical architecture
  • 26. Core Framework • Jobs and Job Groups are defined as metadata in DynamoDB • Control-M scheduler, Custom Application and Data Pipeline for Orchestration • ELT Process with EMR/Sqoop for Extraction. Load and Transform the data through Redshift SQL scripts • Core Framework enables • Restart capability from point of failure • Capturing of operational statistics (# of rows updated, etc.) • Audit capability (which feed caused the Fact to change, etc.) 26
  • 27. Extract • Pre-create EMR resources at the start of Batch • Achieve parallelism in Sqoop with mappers and Fair Scheduling • Sqoop query to add additional fields like Batch_id, Updated_date etc • Data extracts are split and compressed for optimized loading into Redshift 27 AS400 / DB2 EMR with Sqoop S3 Metadata KMS Data Pipeline 1 2 3 4 5 6 Control Flow Data Flow
  • 28. Load • Truncate and Load through Data Pipeline for Staging tables • Dynamic Work Load Management (WLM) queues setup to allow maximum resources during Loading/Transformation • Check and terminate any locks on tables to allow truncation • Capture metrics related to number of rows loaded, time taken, etc.28 StagingS3 KMS Data Pipeline 4 1 2 3 EC2 Control Flow Data Flow
  • 29. Transform • Custom Application for building Dimensions and Facts • SQL Scripts are stored in S3 and executed by ELT process • SQL scripts refactored from SQL Server and AS400 scripts • Non-Functional Requirements are achieved through Custom App 29 1 3 2 4 5 6 7a 7b S3 Staging Facts Metadata Dimensions App Control Flow Data Flow
  • 30. Schema Design • Modified Star Schema • Natural Keys instead of generating unique identifiers • Commonly used columns from Dimensions are copied over to Facts • Surrogate keys are eliminated except for few cases • Compression • Define appropriate Distribution and Sort Keys • Define primary key and Foreign keys
  • 31. Security • AWS Key Management Service (KMS) is used for encrypting access credentials to Source and Target databases • Jenkins job to allow encrypting of credentials using KMS directly by Database Administrators • Amazon EMR, Jenkins resources are given KMS decrypt permissions to allow connecting to Sources and Targets during the ELT process • Standard Security in Transit and at Rest throughout the process • IAM federation through Enterprise Active Directory 31
  • 32. Reporting • Business users access to Facts/Dimensions through Tableau • Power users access to Staging tables through Tableau • Enable Data Analysts access to files in S3 using Hive/Presto • Self-Service capability across business users 32 S3 Staging Facts/ Dimensions Business Analysts Power Users Data Analysts EMR Presto/Hive
  • 33. Workstream Effort • Define Jobs and Job Groups specific to each Workstream • Create Redshift tables (Staging, Facts, Dimensions) based on mapping from AS400 and best practices learned • Create new SQL scripts (based on the logic from AS400/SQL Server code) for transformation • Develop, Test and Deploy in 2-week Agile sprints 33
  • 34. Key Lessons - Technical • Isolate core framework with project specific code repositories • Consolidating logging solution across Amazon S3, Amazon Redshift, Amazon DynamoDB etc., was a challenge • Make appropriate schema changes when migrating to new platform • Custom Framework for gathering operational stats (eg: # of rows loaded etc.) • Start with Test Automation tools and Acceptance Test Driven Development (ATDD) earlier in the project 34
  • 35. Project timeline revisited After the successful pilot: • Executive Leadership accelerated timeline: • Reduce project timeline by 50% (to 12 months) to deliver value faster to LOBs • Realize cost savings by eliminating the DB2 and SQL Server platforms earlier • Users wanted to be on the new platform! • Scholastic & NorthBay partnered to create a training curriculum to ensure a supply of skilled staff would be available to our teams 35
  • 36. Scaling up: 7 workstreams • Developed a model for estimating effort and cost (AWS costs & Labor per LOB migration) • Running agile teams in parallel – employed Agile coaches • Enhanced the core framework to ensure it would scale effectively when in use by multiple teams simultaneously • Building a Code repository for use by all teams • Building CI / CD Frameworks
  • 37. Where are we now? • 4 of 7 LOBs migrated – framework enables complete migration of a functional area within days/weeks as opposed to months. On track to migrate and decommission entire legacy environment within next 6 months • 10 weeks to migrate from an external vendor hosting data and providing reports for one LoB • Cost of Data Ingestion Framework is under $40/day (EC2, EMR, Data Pipeline) • First “Big Data” initiative in production, captures and processes an average of 1.5 Million e reading events daily (peak: 7 Million) • Profile: LOB #1 • Loading ~5-6 Million rows/day (6-7GB/day) • Processing over 1.5 billion rows within Redshift daily • Complete ETL/ELT batch cycle performance improved by over 170%
  • 38. Key lessons – project execution • Essential to monitor and optimize AWS costs • “Data Champion” / “Data Guide” partnership absolutely critical for successful adoption of new platforms • Importance of strong Agile coaches while scaling out Agile teams • Criticality of choosing consulting partners (AWS & North Bay) who can ramp up and supply key resources fast and cycle off the project when finished • Creating new data platforms and migrating data into them is easy, especially with AWS. Decommission of existing data platforms is hard! 38
  • 41. Related Sessions Hear from other customers discussing their Amazon Redshift use cases: • BDM402—Best Practices for Data Warehousing with Amazon Redshift (King.com) • BDA304—What’s New with Amazon Redshift • SVR308—Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year • GAM301—How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful Player Insights • BDA207—Fanatics: Deploying Scalable, Self-Service Business Intelligence on AWS • BDM306— Netflix: Using Amazon S3 as the fabric of our big data ecosystem • BDA203 — Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift (GE Power and Water) • BDM206 — Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT Analytics Platform on AWS (Hello) • STG307— Case Study: How Prezi Built and Scales a Cost-Effective, Multipetabyte Data Platform and Storage Infrastructure on Amazon S3