Introduction to
Amazon Redshift
May, 2014
/Abdullah Cetin CAVDAR @accavdar
What's Amazon Redshift?
Amazon Redshift is a fast and powerful, fully
managed, petabyte-scale data warehouse service in
th...
Features
Petabyte scale, massively parallel
Relational data warehouse
Fully managed, zero admin
SSD and HDD platforms
$999...
Architecture
Client Applications
Integrates with various data loading and ETL (Extract, Transform, and
Load) tools and business intelli...
Connections
Redshift communicates with client applications by using industry-
standard PostgreSQL JDBC and ODBC drivers
Clusters
A cluster is composed of one or more compute nodes
Leader Node coordinates the compute nodes and handles external...
Leader Node
Manage communications with client programs and communications
with compute nodes
Store metadata
Coordinate que...
Compute Nodes
Execute the compiled code, send intermediate results back to the
leader node for final aggregation
It has ow...
Databases
A cluster contains one or more databases
User data is stored on the compute nodes
Amazon Redshift is a Relationa...
Redshift reduces I/O
Column storage - read data you need
Data compression - analyzes and compress your data
Zone Map
Keep ...
Redshift runs on optimized
hardware
Optimized for I/O intensive workloads
High disk density
Runs in HPC - fast network
Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
Redshift is easy to use
Provision in minutes
Monitor query performance
Point and click resize
Built in security
Automatic ...
Redshift has security built-in
SSL to secure data in transit
Encryption to secure data at rest
AES 256 - hardware accelera...
Redshift backs up your data
and recovers from failures
Replication within the cluster and backup to Amazon S3
Backup to Am...
Use Cases
Traditional Enterprise DW
Reduce costs by extending DW rather than adding HW
Migrate completely from existing DW systems
R...
Companies with Big Data
Improve performance by an order of magnitude
Make more data available for analysis
Access business...
SaaS Companies
Add analytic functionality to applications
Scale DW capacity as demand grows
Reduce HW and SW costs by an o...
 Use Caseskillpages
Data Architecture
Redshift Implementation
High Storage Extra Large (XL) DW Node
ETL Activities
Approx. 90 minutes including exports from RDB...
DW Anatomy
Why Redshift works for
SkillPages?
Scale - MPP
Performance - Columnar data access and compression
Platform Integration - S...
Best Practices
Avoid large number of singleton Data Manipulation Language (DML)
statements if possible
Use COPY for upload...
Slides
https://github.com/accavdar/AmazonRedshift
THE END
by Abdullah Cetin CAVDAR / @accavdar
Upcoming SlideShare
Loading in …5
×

Introduction to Amazon Redshift

724 views

Published on

This presentation summarizes Amazon Redshift data warehouse service, its architecture and best practices for application development using Amazon Redshift.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
724
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to Amazon Redshift

  1. 1. Introduction to Amazon Redshift May, 2014 /Abdullah Cetin CAVDAR @accavdar
  2. 2. What's Amazon Redshift? Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud https://aws.amazon.com/redshift/
  3. 3. Features Petabyte scale, massively parallel Relational data warehouse Fully managed, zero admin SSD and HDD platforms $999/TB/Year
  4. 4. Architecture
  5. 5. Client Applications Integrates with various data loading and ETL (Extract, Transform, and Load) tools and business intelligence (BI) reporting, data mining, and analytics tools Redshift is based on industry-standard PostgreSQL, so most existing SQL client applications will work with only minimal changes
  6. 6. Connections Redshift communicates with client applications by using industry- standard PostgreSQL JDBC and ODBC drivers
  7. 7. Clusters A cluster is composed of one or more compute nodes Leader Node coordinates the compute nodes and handles external communication
  8. 8. Leader Node Manage communications with client programs and communications with compute nodes Store metadata Coordinate query execution
  9. 9. Compute Nodes Execute the compiled code, send intermediate results back to the leader node for final aggregation It has own dedicated CPU, memory, and attached disk storage, which are determined by the node type
  10. 10. Databases A cluster contains one or more databases User data is stored on the compute nodes Amazon Redshift is a Relational Database Management System (RDBMS) Amazon Redshift is optimized for high-performance analysis and reporting of very large datasets Amazon Redshift is based on PostgreSQL
  11. 11. Redshift reduces I/O Column storage - read data you need Data compression - analyzes and compress your data Zone Map Keep track of minimum and maximum value for each block Skip over blocks that don't contain data needed for a given query Minimize unnecessary I/O Direct attached storage Hardware optimized for high performance data processing Large data block sizes Large block sizes to make the most of each read
  12. 12. Redshift runs on optimized hardware Optimized for I/O intensive workloads High disk density Runs in HPC - fast network
  13. 13. Redshift parallelizes and distributes everything Query Load Backup/Restore Resize
  14. 14. Redshift is easy to use Provision in minutes Monitor query performance Point and click resize Built in security Automatic backups
  15. 15. Redshift has security built-in SSL to secure data in transit Encryption to secure data at rest AES 256 - hardware accelerated All blocks on disk and in Amazon S3 encrypted No direct access to compute nodes Amazon VPC support
  16. 16. Redshift backs up your data and recovers from failures Replication within the cluster and backup to Amazon S3 Backup to Amazon S3 are continuous, automatic and incremental Continuous monitoring and automated recovery from failures Able to restore snapshots to any Availability Zone
  17. 17. Use Cases
  18. 18. Traditional Enterprise DW Reduce costs by extending DW rather than adding HW Migrate completely from existing DW systems Respond faster to business
  19. 19. Companies with Big Data Improve performance by an order of magnitude Make more data available for analysis Access business data via standard reporting tools
  20. 20. SaaS Companies Add analytic functionality to applications Scale DW capacity as demand grows Reduce HW and SW costs by an order of magnitude
  21. 21.  Use Caseskillpages
  22. 22. Data Architecture
  23. 23. Redshift Implementation High Storage Extra Large (XL) DW Node ETL Activities Approx. 90 minutes including exports from RDBMS, copying to S3, loading stage tables, loading target tables, vacuuming and analysing tables Schema Compression Retention
  24. 24. DW Anatomy
  25. 25. Why Redshift works for SkillPages? Scale - MPP Performance - Columnar data access and compression Platform Integration - S3, Dynamo Operational Advantages Ease of Access Cost
  26. 26. Best Practices Avoid large number of singleton Data Manipulation Language (DML) statements if possible Use COPY for uploading large datasets Choose SORT and DISTRIBUTION keys with care Encode data and time with TIMESTAMP data type Experiment with WLM (Workload Manager) settings
  27. 27. Slides https://github.com/accavdar/AmazonRedshift
  28. 28. THE END by Abdullah Cetin CAVDAR / @accavdar

×