• Save
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
Upcoming SlideShare
Loading in...5
×
 

Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013

on

  • 481 views

Amazon Redshift Overview - Vidhya Srinivasan, Sr. Software Manager, AWS

Amazon Redshift Overview - Vidhya Srinivasan, Sr. Software Manager, AWS

Statistics

Views

Total Views
481
Views on SlideShare
478
Embed Views
3

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 3

http://www.matt-reid.co.uk 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013 Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013 Presentation Transcript

  • Vidhya Srinivasan| vid@amazon.comNeil Thombre | thombren@amazon.comAmazon Redshift
  • What is a Data Warehouse ?•  Large data volumes (TB to PB)•  Queries are complex and IO intensive•  Data typically loaded in batches•  Integrates with Business Intelligence tools for reporting and analysis
  • DW - Existing AWS landscapeScale  Out  Fully  SQL  Compa2ble  Op2mized  data  import  &  export  Efficient  Aggregates  &  Joins  Local  storage  No  single  point  of  failure  RDS   X   X  DynamoDB   X   X   X  EMR/Hadoop   X   X   ½     X  
  • DW - Existing AWS landscapeScale  Out  Fully  SQL  Compa2ble  Op2mized  data  import  &  export  Efficient  Aggregates  &  Joins  Local  storage  No  single  point  of  failure  RDS   X   X  DynamoDB   X   X   X  EMR/Hadoop   X   X   ½   X  RedshiJ   X   X   X   X   X   X  
  • Introducing Amazon Redshift•  Fully managed database service•  Built from the ground up for DW•  Secure & Reliable – Fault tolerant, automatic backup, encryption•  Fast – Scale out, specialized hardware, columnar storage•  Inexpensive – 1/10th the cost of alternatives, pay as you go•  Easy to Use – Provision & resize with a few clicks•  Compatible – JDBC/ODBC, mostly PostgreSQL compatible
  • Why did we call it Amazon Redshift?Edwin  Hubble  1889  –  1953  
  • >> How much storage is provisionedby Redshift customers ?>>  How  many  Redshi<  clusters  were  created  in  first  10  weeks?      
  • Amazon Redshift architecture•  Leader Node–  SQL endpoint–  Stores metadata–  Coordinates query execution•  Compute Nodes–  Local, columnar storage–  Execute queries in parallel–  Load, backup, restore via Amazon S3–  Parallel load from Amazon DynamoDB•  Single node version available10  GigE  (HPC)  IngesKon  Backup  Restore  SQL Clients/BI Tools128GB RAM16TB disk16 coresAmazon S3JDBC/ODBC  128GB RAM16TB disk16 coresComputeNode128GB RAM16TB disk16 coresComputeNode128GB RAM16TB disk16 coresComputeNodeLeaderNode
  • Amazon Redshift dramatically reduces I/O•  Data compression•  Zone maps•  Direct-attached storage•  Large data block sizesID   Age   State   Amount  123   20   CA   500  345   25   WA   250  678   40   FL   125  957   37   WA   375  •  With row storage you dounnecessary I/O•  To get total amount, you have toread everything
  • Amazon Redshift dramatically reduces I/O•  Data compression•  Zone maps•  Direct-attached storage•  Large data block sizesID   Age   State   Amount  123   20   CA   500  345   25   WA   250  678   40   FL   125  957   37   WA   375  •  With column storage, you onlyread the data you need
  • Amazon Redshift dramatically reduces I/O•  Column storage•  Data compression•  Zone maps•  Direct-attached storage•  Large data block sizes•  Columnar compression savesspace & reduces I/O•  Amazon Redshift analyzes andcompresses your dataanalyze compression listing;Table | Column | Encoding---------+----------------+----------listing | listid | deltalisting | sellerid | delta32klisting | eventid | delta32klisting | dateid | bytedictlisting | numtickets | bytedictlisting | priceperticket | delta32klisting | totalprice | mostly32listing | listtime | raw
  • Amazon Redshift dramatically reduces I/O•  Column storage•  Data compression•  Direct-attached storage•  Large data block sizes•  Track of the minimum andmaximum value for each block•  Skip over blocks that don’tcontain the data needed for agiven query•  Minimize unnecessary I/O
  • Amazon Redshift dramatically reduces I/O•  Column storage•  Data compression•  Zone maps•  Direct-attached storage•  Large data block sizes•  Use direct-attached storage tomaximize throughput•  Hardware optimized for highperformance data processing•  Large block sizes to make themost of each read•  Amazon Redshift managesdurability for you
  • Amazon Redshift runs on optimized hardwareHS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate16 GB RAM2 TB disk2 coresHS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage•  Optimized for I/O intensive workloads•  High disk density•  Runs in HPC - fast network•  HS1.8XL available on Amazon EC2•  Need to leverage all the nodes128 GB RAM16 cores16 TB disk
  • Amazon Redshift parallelizes and distributes everything•  Query•  Load•  Backup/Restore•  Resize
  • Amazon Redshift parallelizes and distributes everything•  Load in parallel from Amazon S3or Amazon DynamoDB•  Data automatically distributed andsorted according to DDL•  Scales linearly with number ofnodesAmazon S3/DynamoDB128GB RAM16TB disk16 coresComputeNode128GB RAM16TB disk16 coresComputeNode128GB RAM16TB disk16 coresComputeNode•  Query•  Load•  Backup/Restore•  Resize
  • Amazon Redshift parallelizes and distributes everything•  Backups to Amazon S3 areautomatic, continuous andincremental•  Configurable system snapshotretention period•  Take user snapshots on-demand•  Streaming restores enable you toresume querying fasterAmazon S3128GB RAM16TB disk16 coresComputeNode128GB RAM16TB disk16 coresComputeNode128GB RAM16TB disk16 coresComputeNode•  Query•  Load•  Backup/Restore•  Resize
  • Amazon Redshift parallelizes and distributes everything•  Resize while remaining online•  Provision a new cluster in thebackground•  Copy data in parallel from node tonode•  Only charged for source cluster•  Query•  Load•  Backup/Restore•  ResizeSQL Clients/BI Tools128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresLeaderNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresLeaderNode
  • Amazon Redshift parallelizes and distributes everything•  Query•  Load•  Backup/Restore•  ResizeSQL Clients/BI Tools128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresComputeNode128GB RAM48TB disk16 coresLeaderNode•  Automatic SQL endpoint switchovervia DNS•  Decommission the source cluster•  Simple operation via AWS Console orAPI
  • Amazon Redshift lets you start small and grow bigExtra Large Node (HS1.XL)3 spindles, 2 TB, 16 GB RAM, 2 coresSingle Node (2 TB)Cluster 2-32 Nodes (4 TB – 64 TB)Eight Extra Large Node (HS1.8XL)24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigECluster 2-100 Nodes (32 TB – 1.6 PB)8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XLXLXL XL XL XL XL XL XL XLXL XL XL XL XL XL XL XLXL XL XL XL XL XL XL XLXL XL XL XL XL XL XL XLNote:  Nodes  not  to  scale