23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Amazon Redshift

  • 443 views
Uploaded on

Amazon Redshift is the new data warehouse service from Amazon Web Services. Redshift offers you fast query performance when analyzing data sets from a few hundred gigabytes to over a petabyte at a …

Amazon Redshift is the new data warehouse service from Amazon Web Services. Redshift offers you fast query performance when analyzing data sets from a few hundred gigabytes to over a petabyte at a fraction of the cost of traditional solutions. In this webinar, we will take a detailed look at Redshift, including a live demonstration. This webinar is ideal for anyone looking to gain deeper insight into their data, without the usual challenges of time, cost and effort.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
443
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Amazon  Redshi.      
  • 2. Security   TECHNICAL   CONTENT   Backup   Loading   Demo   Pricing   Customers   Use  Cases   Defini1on   Architecture   TIME  
  • 3. Amazon  RedshiN   A  fast  and  powerful,  petabyte-­‐scale  data  warehouse.   Delivered  as  a  managed  service.  
  • 4. Where  does  it  fit?  
  • 5. Deployment & Administration Application Services Compute Storage Networking AWS Global Infrastructure Database
  • 6. Amazon Elastic Map Reduce Amazon Redshift Data  Warehouse  Service   Hosted  Hadoop  Service   Amazon DynamoDB NoSQL  Data  Store   Deployment & Administration Amazon RDS MySQL,  Oracle  and  SQL  Server   Application Services Compute Storage Database Networking AWS Global Infrastructure Amazon S3 Object  Storage    
  • 7. Structure Low High Large Hadoop   (EMR)   MPP  DW   (RedshiN)   Size NoSQL   (DynamoDB)   Small Tradi1onal     DW   (RDS)  
  • 8. What  are  the  benefits?  
  • 9. •  Easy to provision and scale up massively •  Pay as you go •  Price-performance •  Standards-based
  • 10. How  are     customers      using  it?  
  • 11. 1.  Replace   ETL Application OLTP Database Data Warehouse Reporting and BI  
  • 12. 1.  Replace   ETL Application OLTP Database Amazon Redshift Reporting and BI  
  • 13. 2.  Assist   ETL Application OLTP Database Data Warehouse Reporting and BI  
  • 14. 2.  Assist   Amazon Redshift Application Reporting and BI   OLTP Database Data Warehouse
  • 15. 3.  New  Warehouse   Application OLTP Database Reporting and BI  
  • 16. 3.  New  Warehouse   Application OLTP Database Amazon Redshift Reporting and BI  
  • 17. 4.  Log  Analysis   Amazon S3 Web / Application Servers Amazon Redshift Reporting and BI  
  • 18. Not  Designed  For:   Transac1onal  workload   Very  small  data  sets   Sub-­‐second  response  1me  
  • 19. How  does  it  work?  
  • 20. SQL Clients/BI Tools JDBC/ODBC   128GB RAM Leader Node 16 cores 16TB disk 128GB RAM 128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16TB disk 16TB disk 16TB disk 16 cores 16 cores 16 cores
  • 21. SQL Clients/BI Tools ID   Name   1   John  Smith   2   Jane  Jones   3   Peter  Black   4   Pat  Partridge   5   Sarah  Cyan   6   Brian  Snail   JDBC/ODBC   128GB RAM Leader Node 16 cores 16TB disk 128GB RAM 128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16TB disk 16TB disk 16TB disk 16 cores 16 cores 16 cores 1   John  Smith   2   Jane  Jones   3   Peter  Black   4   Pat  Partridge   5   Sarah  Cyan   6   Brian  Snail  
  • 22. SQL Clients/BI Tools JDBC/ODBC   128GB RAM 16 cores Results   SQL   16TB disk 128GB RAM 128GB RAM 16 cores 16 cores 128GB RAM 16 cores Results   SQL   Results   SQL   16TB disk Results   SQL   16TB disk 16TB disk 1   John  Smith   2   Jane  Jones   3   Peter  Black   4   Pat  Partridge   5   Sarah  Cyan   6   Brian  Snail  
  • 23. Do  I  have  a    choice    of  nodes?  
  • 24. 16 GB RAM 2 cores 2 TB disk Compute     Node     Choice     XL     Single  Node  (2  TB)       XL Cluster  2-­‐32  Nodes  (4  TB  –  64  TB)       XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL
  • 25. 128 GB RAM 16 cores 16 TB disk Compute     Node     Choice     8XL     Cluster 2-100 Nodes (32 TB – 1.6 PB) 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
  • 26. How  do  I  run  queries?  
  • 27. JDBC/ODBC     Query   Tools       Redshift DB  Visualizer   SQL  Workbench  
  • 28. BI  Tools  
  • 29. Do  I  need  to   performance  tune?    
  • 30. Performance     =     Parallelism     +     Columnar   +   Compression   +   Zone  Maps  
  • 31. SQL Clients/BI Tools JDBC/ODBC   128GB RAM Massively   Parallel   Processing   Leader Node 16 cores 16TB disk 10  GigE   128GB RAM Choose     Good     Distribu1on  Keys   128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16TB disk 16TB disk 16TB disk 16 cores 16 cores 16 cores 1   John  Smith   2   Jane  Jones   3   Peter  Black   4   Pat  Partridge   5   Sarah  Cyan   6   Brian  Snail  
  • 32. ID   State   123   20   CA   345   Data  Storage:     Age   25   WA   678   40   FL   Row-­‐based   Vs   Columnar     Row  storage   Column  storage  
  • 33. Raw  encoding  (RAW)   Byte-­‐dic1onary  (BYTEDICT)   Delta  encoding  (DELTA  /  DELTA32K)   Compression   Mostly  encoding  (MOSTLY8  /  MOSTLY16  /  MOSTLY32)   Runlength  encoding  (RUNLENGTH)   Text  encoding  (TEXT255  /  TEXT32K)   Average:  4-­‐8x    
  • 34. CREATE  TABLE  orders  (                                                                                              orderkey      int8        NOT  NULL      DISTKEY,      custkey      NOT  NULL,      orderstatus DDL      int8        char(1)      NOT  NULL  ,      totalprice    numeric(12,2)    NOT  NULL  ,      orderdate    date        NOT  NULL        SORTKEY  ,      orderpriority    char(15)      NOT  NULL,          clerk    char(15)      NOT  NULL  ,      shippriority    int4      NOT  NULL,      comment    varchar(79)   );          NOT  NULL                                      
  • 35. How  do  I  get  data  in?  
  • 36. S3 S3   Redshift copy events from 's3://mybucket/data/allevents_pipe.txt' credentials 'aws_access_key_id=<>; aws_secret_access_key=<>' maxerror 5 delimiter '|' timeformat 'YYYY-MM-DD HH:MI:SS';  
  • 37. DynamoDB DynamoDB   Redshift copy favoritemovies from 'dynamodb://ProductCatalog' credentials 'aws_access_key_id=<>; aws_secret_access_key=<>' READRATIO 50;
  • 38. Amazon Redshift ETL   Tools   Source Systems ETL
  • 39. How  do  I  back  it  up?  
  • 40. SQL Clients/BI Tools 128GB RAM Leader Node 16 cores   •  •  Backup   Automa1c   Incremental   16TB disk 128GB RAM 128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16TB disk 16TB disk 16TB disk 16 cores 16 cores Amazon S3 16 cores
  • 41. Can  I  stop  a  cluster?  
  • 42. SQL Clients/BI Tools 128GB RAM Leader Node 16 cores 16TB disk Snapshot   128GB RAM 128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16TB disk 16TB disk 16TB disk 16 cores 16 cores Snapshot Amazon S3 16 cores
  • 43. SQL Clients/BI Tools 128GB RAM Leader Node 16 cores 16TB disk Snapshot   Restore   128GB RAM 128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16TB disk 16TB disk 16TB disk 16 cores 16 cores Snapshot Amazon S3 16 cores
  • 44. How  do  I  resize  it?  
  • 45. BI Tools 128GB RAM 128GB RAM Leader Node Leader Node 16 cores 16 cores 48TB disk 48TB disk Resize   128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk
  • 46. SQL Clients/BI Tools 128GB RAM Leader Node 16 cores 48TB disk Resize   128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk 128GB RAM Compute Node 16 cores 48TB disk
  • 47. Is  it  secure?  
  • 48. Customer  VPC   SQL Clients/BI Tools SSL   Internal  VPC   128GB RAM Leader Node 16 cores 16TB disk Encrypted  Data  at  Rest   Encrypted  Data  in  Transit   Security  Groups   DB  Security   128GB RAM 128GB RAM 128GB RAM Compute Node Compute Node Compute Node 16 cores VPC   16TB disk 16 cores 16TB disk 16 cores 16TB disk Access  Management   SSL   Amazon S3
  • 49. How  do  I  get  started?  
  • 50. hop://aws.amazon.com/redshiN    Detail    FAQ    Pricing    Doco  –  Gepng  Started  Guide    Forums     Youtube.com    Search  for  Amazon  RedshiN  Best  Prac1ces