SlideShare a Scribd company logo
1 of 33
1
Best Practices for Supercharging Cloud
Analytics on Amazon Redshift
Tina Adams, Amazon Redshift
Brandon Davis, Cervello
Maneesh Joshi, SnapLogic
May 2014
2
Featured Speakers
3
Agenda
• Amazon Redshift Feature and Market Update
• SnapLogic Case Studies with Amazon Redshift
• Demo: SnapLogic Free Trial for Amazon Redshift and
RDS
• Cervello: Implementation Best Practices
4
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Amazon Redshift
5
Amazon Redshift Architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
– Optimized for data processing
– DW1: HDD; scale from 2TB to 1.6PB
– DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
6
Amazon Redshift is priced to let you analyze all
your data
• Number of nodes x cost per hr
• No charge for leader node
• No upfront costs
• Pay as you go
DW1 (HDD)
Price Per Hour for
DW1.XL Single Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year Reservation $ 0.500 $ 2,190
3 Year Reservation $ 0.228 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year Reservation $ 0.161 $ 8,794
3 Year Reservation $ 0.100 $ 5,498
7
Amazon Redshift Feature Delivery
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
Unload Encrypted Files
DUB (4/25)
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
4 byte UTF-8 (7/18)
Statement Timeout (7/22)
SHA1 Builtin (7/15)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize
Perf., Approximate Count Distinct, SNS
Alerts (11/13)
SOC1/2/3 (5/8)
Sharing snapshots (7/18)
Resource Level IAM (8/9)
PCI (8/22)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters,
new system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
8
Improved Concurrency
15
50
9
COPY from JSON
{
"jsonpaths":
[
"$['id']",
"$['name']",
"$['location'][0]",
"$['location'][1]",
"$['seats']"
]
}
COPY venue FROM 's3://mybucket/venue.json'
credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret-
access-key>'
JSON AS 's3://mybucket/venue_jsonpaths.json';
10
COPY from Amazon Elastic MapReduce
COPY sales
From ‘emr:// j-1H7OUO3B52HI5/myoutput/part*'
credentials ‘aws_access_key_id=<access-key id>;
aws_secret_access_key=<secret-access-key>';
Amazon EMR Amazon Redshift
11
REGEX_SUBSTR()
select email, regexp_substr(email,'@[^.]*')
from users limit 5;
email | regexp_substr
--------------------------------------------+----------------
Suspendisse.tristique@nonnisiAenean.edu | @nonnisiAenean
sed@lacusUtnec.ca | @lacusUtnec
elementum@semperpretiumneque.ca | @semperpretiumneque
Integer.mollis.Integer@tristiquealiquet.org | @tristiquealiquet
Donec.fringilla@sodalesat.org | @sodalesat
12
Resize Progress
• Progress indicator in
console
• New API call
13
ECDHE cipher suites for perfect forward
security over SSL
ECDHE-RSA & ECDHE-ECDCSA cipher suites supported
14
Amazon Redshift integrates with multiple data
sources
Amazon S3 Amazon EMR
Amazon Redshift
DynamoDB
Amazon RDS
Corporate Datacenter
15
Agenda
• Amazon Redshift Feature and Market Update
• SnapLogic Case Studies with Amazon Redshift
• Demo: SnapLogic Free Trial for Amazon Redshift and
RDS
• Cervello: Implementation Best Practices
16
The SnapLogic Platform for Elastic Integration
Powering Analytics, Apps and APIs
Data Applications APIs
17
Why SnapLogic?
Multi-Point Orchestration
• SnapStore: 160+ Prebuilt Snaps
• Orchestration & Workflow
Modern Platform
• Elastic, Scale-out Architecture
• Hybrid: Cloud to Cloud and
Cloud to Ground Use Cases
Faster Integration
• Easily Design, Monitor, Manage
• Deploy in Days not Months
18
Multi-Point: Comprehensive Connectivity
Snap your Apps: 160+ pre-built integrations
19
Software-defined Integration
Metadata
Data
• Streams: No data is
stored/cached
• Secure: 100%
standards-based
• Elastic: Scales out &
handles data, app, API
integration use cases
Hybrid Scale-out Architecture Respects Data Gravity
20
International Hotel Chain Reservation Data Mgmt.
• 126 TB of hotel
reservation data
• Prohibitive cost-per-
query for analytics
• Unacceptable
performance
PAST PRESENT
• FedEx’ed 126 TB of data to load into
AWS Redshift
• Now run daily sync between on-
premise and cloud with SnapLogic
of data changes (100-150GB)
• Enrich analytics with Twitter and
Travelocity data
• Improved cost-per-query and
performance
21
Mid-sized Pharma Creates Cloud Data Mart
Cloud to On-prem Snaplex
REST
Cloud to Cloud Snaplex
Metadata
Data
• Consolidate DBs
(Customer, Address,
and Order) and SFDC
(Contact and Account)
into Redshift
• MicroStrategy is the
visualization layer
22
Agenda
• Amazon Redshift Feature and Market Update
• SnapLogic Case Studies with Amazon Redshift
• Demo: SnapLogic Free Trial for Amazon Redshift
and RDS
• Cervello: Implementation Best Practices
23
DEMO
24
Agenda
• Amazon Redshift Feature and Market Update
• SnapLogic Case Studies with Amazon Redshift
• Demo: SnapLogic Free Trial for Amazon Redshift and
RDS
• Cervello: Implementation Best Practices
25
Enterprise
Performance
Management
(Finance)
Customer
Relationship
Management
(Sales &
Marketing)
Data Management
Custom Development
Business
Intelligence &
Analytics
(IT)
• We have offices in Boston, New York, Dallas and the UK
• Offshore development and support teams in Russia and India
• We partner with the leading on premise and cloud technology
companies
Advise, Implement, Support
Cervello Helps Clients Win With Data
26
Implementation Case Study
• Hospitality industry analytics
– Detailed transactional data
– Weekly / monthly / yearly trend analysis
– Began with single-node cluster, adding nodes as data volumes
grow
Source Data Redshift Analytics
ETL
27
• Collect external data loads
before merging with
existing data
• Maintain history of
cleansed and standardized
source data
• Use data structures
optimized for analytics
– Dimension and fact tables for
analytics
– Aggregate tables
Best Practice #1: Choose The Right Pattern
• Staging tables
• History tables
• Star schema data
warehouse
Requirements Design
28
Best Practice #2: Select the Right Node Type
• Performance was good with
initial volumes and small
data sets on single node
• Evaluated dense storage
(dw1) and dense compute
(dw2) nodes
• More opportunity to
optimize design as volumes
grew
• Increased nodes to handle
larger volumes
– Solution leverages dense
storage (dw1) nodes
– Expected to stabilize between
10-20TB
• Have also seen smaller
volumes that work really well
in dense compute (dw2) nodes
Early Stages Mature Stage
29
Best Practice #3: Leverage MPP
• Spread data evenly across
nodes while also optimizing
join performance
• Distribution key and sort
keys are primary
considerations
Leader
Node
Compute
Node 1
Compute
Node 2
Compute
Node n
Compute
Node 3
• Initial fact table distribution key
caused skewed data
• Changed to dimension foreign
key with better distribution for
40%+ improvement in query
times
• Surrogate keys on dimension
tables
– Primary key
– Sort key and distribution key OR
distribute to all nodes
– Sort on foreign keys in fact tables
Goals Approach
30
Best Practice #4: Use Columnar Compression
• Started with compression
settings based on general
data types
– VARCHAR to TEXT255,
INTEGER to MOSTLY16, etc.
– Iterate using ANALYZE
COMPRESSION
• Redshift applies automatic
compression during COPY
– Staging tables
• Reduce I/O workload by
minimizing size of data
stored on disk
Goals Approach
31
Best Practice #5: Load and Manage Data
• ETL and ELT
– ETL: First set of processes prepares data for analytics –
business logic, standardization, validation
– ELT: Second set of processes load data into Redshift and
transform into analytical structures
• Data management
– Enforce constraints within ETL processes
– Analyze after loads to update statistics
– Vacuum after large loads to existing tables, updates and
deletes
32
Bringing it All Together
• Analytic queries
– Minimize number of query columns to improve performance
– Most queries use SUM or COUNT
– Leveraging aggregate tables for monthly dashboards
• Explain long running queries to help optimize design
– Sorting / merging within nodes and merging at leader node
33
Learn more…
1. Try out the SnapLogic Free Trial for Amazon Redshift:
http://snaplogic.com/redshift-trial
2. Learn more about Amazon Redshift at:
http://aws.amazon.com/redshift
3. Learn more about Cervello at:
http://mycervello.com/

More Related Content

What's hot

Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseAttunity
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackSnapLogic
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataNaveen Korakoppa
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaCarole Gunst
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for TeradataAttunity
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Reblaze Case Study on GCP
Reblaze Case Study on GCPReblaze Case Study on GCP
Reblaze Case Study on GCPIdan Tohami
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story Roman Chukh
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakePat Patterson
 

What's hot (20)

Optimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data WarehouseOptimize Data for the Logical Data Warehouse
Optimize Data for the Logical Data Warehouse
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
The Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management StackThe Impact of SMACT on the Data Management Stack
The Impact of SMACT on the Data Management Stack
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache Kafka
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Attunity Solutions for Teradata
Attunity Solutions for TeradataAttunity Solutions for Teradata
Attunity Solutions for Teradata
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Reblaze Case Study on GCP
Reblaze Case Study on GCPReblaze Case Study on GCP
Reblaze Case Study on GCP
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data Lake
 

Viewers also liked

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Amazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and RedshiftStreaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and RedshiftAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄
超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄
超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄schoowebcampus
 

Viewers also liked (10)

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
Announcing AWS Batch - Run Batch Jobs At Scale - December 2016 Monthly Webina...
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and RedshiftStreaming Data Analytics with Amazon Kinesis Firehose and Redshift
Streaming Data Analytics with Amazon Kinesis Firehose and Redshift
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄
超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄
超絶UXを実現したい!小規模Web制作会社でもフロントエンドのスキルを磨く方法と実践 【根性論と精神論編】 先生:桟 義雄
 

Similar to Best Practices for Supercharging Cloud Analytics on Amazon Redshift

London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseAmazon Web Services
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike, Inc.
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftAmazon Web Services LATAM
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon RedshiftAmazon Web Services
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoAmazon Web Services
 

Similar to Best Practices for Supercharging Cloud Analytics on Amazon Redshift (20)

London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon Redshift
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Getting started with amazon redshift - Toronto
Getting started with amazon redshift - TorontoGetting started with amazon redshift - Toronto
Getting started with amazon redshift - Toronto
 

More from SnapLogic

The AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesThe AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesSnapLogic
 
Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI SnapLogic
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic
 
SnapLogic Culture Deck
SnapLogic Culture DeckSnapLogic Culture Deck
SnapLogic Culture DeckSnapLogic
 
Euromoney's integration journey: Selecting SnapLogic's self-service integrati...
Euromoney's integration journey: Selecting SnapLogic's self-service integrati...Euromoney's integration journey: Selecting SnapLogic's self-service integrati...
Euromoney's integration journey: Selecting SnapLogic's self-service integrati...SnapLogic
 
Digital Transformation is Cloud-Powered
Digital Transformation is Cloud-PoweredDigital Transformation is Cloud-Powered
Digital Transformation is Cloud-PoweredSnapLogic
 
How to Build a Winning Data Culture
How to Build a Winning Data CultureHow to Build a Winning Data Culture
How to Build a Winning Data CultureSnapLogic
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies SnapLogic
 
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...SnapLogic
 
SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic
 
Self-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at BoxSelf-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at BoxSnapLogic
 
Live Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applicationsLive Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applicationsSnapLogic
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Spring 2017 release customer webinar
Spring 2017 release customer webinarSpring 2017 release customer webinar
Spring 2017 release customer webinarSnapLogic
 
SnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistantSnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistantSnapLogic
 
Webinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTWebinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTSnapLogic
 
SnapLogic Culture
SnapLogic CultureSnapLogic Culture
SnapLogic CultureSnapLogic
 
SnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen IntegratorSnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen IntegratorSnapLogic
 
Big Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowBig Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowSnapLogic
 

More from SnapLogic (20)

The AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic PerspectivesThe AI Mindset: Bridging Industry and Academic Perspectives
The AI Mindset: Bridging Industry and Academic Perspectives
 
Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI Supercharging Self-Service API Integration with AI
Supercharging Self-Service API Integration with AI
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
SnapLogic Culture Deck
SnapLogic Culture DeckSnapLogic Culture Deck
SnapLogic Culture Deck
 
Euromoney's integration journey: Selecting SnapLogic's self-service integrati...
Euromoney's integration journey: Selecting SnapLogic's self-service integrati...Euromoney's integration journey: Selecting SnapLogic's self-service integrati...
Euromoney's integration journey: Selecting SnapLogic's self-service integrati...
 
Digital Transformation is Cloud-Powered
Digital Transformation is Cloud-PoweredDigital Transformation is Cloud-Powered
Digital Transformation is Cloud-Powered
 
How to Build a Winning Data Culture
How to Build a Winning Data CultureHow to Build a Winning Data Culture
How to Build a Winning Data Culture
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
 
SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018
 
Self-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at BoxSelf-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at Box
 
Live Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applicationsLive Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applications
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Spring 2017 release customer webinar
Spring 2017 release customer webinarSpring 2017 release customer webinar
Spring 2017 release customer webinar
 
SnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistantSnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistant
 
Webinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTWebinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoT
 
The API Lie
The API LieThe API Lie
The API Lie
 
SnapLogic Culture
SnapLogic CultureSnapLogic Culture
SnapLogic Culture
 
SnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen IntegratorSnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen Integrator
 
Big Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowBig Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To Know
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Best Practices for Supercharging Cloud Analytics on Amazon Redshift

  • 1. 1 Best Practices for Supercharging Cloud Analytics on Amazon Redshift Tina Adams, Amazon Redshift Brandon Davis, Cervello Maneesh Joshi, SnapLogic May 2014
  • 3. 3 Agenda • Amazon Redshift Feature and Market Update • SnapLogic Case Studies with Amazon Redshift • Demo: SnapLogic Free Trial for Amazon Redshift and RDS • Cervello: Implementation Best Practices
  • 4. 4 Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year Amazon Redshift
  • 5. 5 Amazon Redshift Architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3; load from Amazon DynamoDB or SSH • Two hardware platforms – Optimized for data processing – DW1: HDD; scale from 2TB to 1.6PB – DW2: SSD; scale from 160GB to 256TB 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 6. 6 Amazon Redshift is priced to let you analyze all your data • Number of nodes x cost per hr • No charge for leader node • No upfront costs • Pay as you go DW1 (HDD) Price Per Hour for DW1.XL Single Node Effective Annual Price per TB On-Demand $ 0.850 $ 3,723 1 Year Reservation $ 0.500 $ 2,190 3 Year Reservation $ 0.228 $ 999 DW2 (SSD) Price Per Hour for DW2.L Single Node Effective Annual Price per TB On-Demand $ 0.250 $ 13,688 1 Year Reservation $ 0.161 $ 8,794 3 Year Reservation $ 0.100 $ 5,498
  • 7. 7 Amazon Redshift Feature Delivery Service Launch (2/14) PDX (4/2) Temp Credentials (4/11) Unload Encrypted Files DUB (4/25) NRT (6/5) JDBC Fetch Size (6/27) Unload logs (7/5) 4 byte UTF-8 (7/18) Statement Timeout (7/22) SHA1 Builtin (7/15) Timezone, Epoch, Autoformat (7/25) WLM Timeout/Wildcards (8/1) CRC32 Builtin, CSV, Restore Progress (8/9) UTF-8 Substitution (8/29) JSON, Regex, Cursors (9/10) Split_part, Audit tables (10/3) SIN/SYD (10/8) HSM Support (11/11) Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS Alerts (11/13) SOC1/2/3 (5/8) Sharing snapshots (7/18) Resource Level IAM (8/9) PCI (8/22) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13) EIP Support for VPC Clusters (12/28) New query monitoring system tables and diststyle all (1/13) Redshift on DW2 (SSD) Nodes (1/23) Compression for COPY from SSH, Fetch size support for single node clusters, new system tables with commit stats, row_number(), strotol() and query termination (2/13) Resize progress indicator & Cluster Version (3/21) Regex_Substr, COPY from JSON (3/25)
  • 9. 9 COPY from JSON { "jsonpaths": [ "$['id']", "$['name']", "$['location'][0]", "$['location'][1]", "$['seats']" ] } COPY venue FROM 's3://mybucket/venue.json' credentials 'aws_access_key_id=<access-key-id>; aws_secret_access_key=<secret- access-key>' JSON AS 's3://mybucket/venue_jsonpaths.json';
  • 10. 10 COPY from Amazon Elastic MapReduce COPY sales From ‘emr:// j-1H7OUO3B52HI5/myoutput/part*' credentials ‘aws_access_key_id=<access-key id>; aws_secret_access_key=<secret-access-key>'; Amazon EMR Amazon Redshift
  • 11. 11 REGEX_SUBSTR() select email, regexp_substr(email,'@[^.]*') from users limit 5; email | regexp_substr --------------------------------------------+---------------- Suspendisse.tristique@nonnisiAenean.edu | @nonnisiAenean sed@lacusUtnec.ca | @lacusUtnec elementum@semperpretiumneque.ca | @semperpretiumneque Integer.mollis.Integer@tristiquealiquet.org | @tristiquealiquet Donec.fringilla@sodalesat.org | @sodalesat
  • 12. 12 Resize Progress • Progress indicator in console • New API call
  • 13. 13 ECDHE cipher suites for perfect forward security over SSL ECDHE-RSA & ECDHE-ECDCSA cipher suites supported
  • 14. 14 Amazon Redshift integrates with multiple data sources Amazon S3 Amazon EMR Amazon Redshift DynamoDB Amazon RDS Corporate Datacenter
  • 15. 15 Agenda • Amazon Redshift Feature and Market Update • SnapLogic Case Studies with Amazon Redshift • Demo: SnapLogic Free Trial for Amazon Redshift and RDS • Cervello: Implementation Best Practices
  • 16. 16 The SnapLogic Platform for Elastic Integration Powering Analytics, Apps and APIs Data Applications APIs
  • 17. 17 Why SnapLogic? Multi-Point Orchestration • SnapStore: 160+ Prebuilt Snaps • Orchestration & Workflow Modern Platform • Elastic, Scale-out Architecture • Hybrid: Cloud to Cloud and Cloud to Ground Use Cases Faster Integration • Easily Design, Monitor, Manage • Deploy in Days not Months
  • 18. 18 Multi-Point: Comprehensive Connectivity Snap your Apps: 160+ pre-built integrations
  • 19. 19 Software-defined Integration Metadata Data • Streams: No data is stored/cached • Secure: 100% standards-based • Elastic: Scales out & handles data, app, API integration use cases Hybrid Scale-out Architecture Respects Data Gravity
  • 20. 20 International Hotel Chain Reservation Data Mgmt. • 126 TB of hotel reservation data • Prohibitive cost-per- query for analytics • Unacceptable performance PAST PRESENT • FedEx’ed 126 TB of data to load into AWS Redshift • Now run daily sync between on- premise and cloud with SnapLogic of data changes (100-150GB) • Enrich analytics with Twitter and Travelocity data • Improved cost-per-query and performance
  • 21. 21 Mid-sized Pharma Creates Cloud Data Mart Cloud to On-prem Snaplex REST Cloud to Cloud Snaplex Metadata Data • Consolidate DBs (Customer, Address, and Order) and SFDC (Contact and Account) into Redshift • MicroStrategy is the visualization layer
  • 22. 22 Agenda • Amazon Redshift Feature and Market Update • SnapLogic Case Studies with Amazon Redshift • Demo: SnapLogic Free Trial for Amazon Redshift and RDS • Cervello: Implementation Best Practices
  • 24. 24 Agenda • Amazon Redshift Feature and Market Update • SnapLogic Case Studies with Amazon Redshift • Demo: SnapLogic Free Trial for Amazon Redshift and RDS • Cervello: Implementation Best Practices
  • 25. 25 Enterprise Performance Management (Finance) Customer Relationship Management (Sales & Marketing) Data Management Custom Development Business Intelligence & Analytics (IT) • We have offices in Boston, New York, Dallas and the UK • Offshore development and support teams in Russia and India • We partner with the leading on premise and cloud technology companies Advise, Implement, Support Cervello Helps Clients Win With Data
  • 26. 26 Implementation Case Study • Hospitality industry analytics – Detailed transactional data – Weekly / monthly / yearly trend analysis – Began with single-node cluster, adding nodes as data volumes grow Source Data Redshift Analytics ETL
  • 27. 27 • Collect external data loads before merging with existing data • Maintain history of cleansed and standardized source data • Use data structures optimized for analytics – Dimension and fact tables for analytics – Aggregate tables Best Practice #1: Choose The Right Pattern • Staging tables • History tables • Star schema data warehouse Requirements Design
  • 28. 28 Best Practice #2: Select the Right Node Type • Performance was good with initial volumes and small data sets on single node • Evaluated dense storage (dw1) and dense compute (dw2) nodes • More opportunity to optimize design as volumes grew • Increased nodes to handle larger volumes – Solution leverages dense storage (dw1) nodes – Expected to stabilize between 10-20TB • Have also seen smaller volumes that work really well in dense compute (dw2) nodes Early Stages Mature Stage
  • 29. 29 Best Practice #3: Leverage MPP • Spread data evenly across nodes while also optimizing join performance • Distribution key and sort keys are primary considerations Leader Node Compute Node 1 Compute Node 2 Compute Node n Compute Node 3 • Initial fact table distribution key caused skewed data • Changed to dimension foreign key with better distribution for 40%+ improvement in query times • Surrogate keys on dimension tables – Primary key – Sort key and distribution key OR distribute to all nodes – Sort on foreign keys in fact tables Goals Approach
  • 30. 30 Best Practice #4: Use Columnar Compression • Started with compression settings based on general data types – VARCHAR to TEXT255, INTEGER to MOSTLY16, etc. – Iterate using ANALYZE COMPRESSION • Redshift applies automatic compression during COPY – Staging tables • Reduce I/O workload by minimizing size of data stored on disk Goals Approach
  • 31. 31 Best Practice #5: Load and Manage Data • ETL and ELT – ETL: First set of processes prepares data for analytics – business logic, standardization, validation – ELT: Second set of processes load data into Redshift and transform into analytical structures • Data management – Enforce constraints within ETL processes – Analyze after loads to update statistics – Vacuum after large loads to existing tables, updates and deletes
  • 32. 32 Bringing it All Together • Analytic queries – Minimize number of query columns to improve performance – Most queries use SUM or COUNT – Leveraging aggregate tables for monthly dashboards • Explain long running queries to help optimize design – Sorting / merging within nodes and merging at leader node
  • 33. 33 Learn more… 1. Try out the SnapLogic Free Trial for Amazon Redshift: http://snaplogic.com/redshift-trial 2. Learn more about Amazon Redshift at: http://aws.amazon.com/redshift 3. Learn more about Cervello at: http://mycervello.com/