• Like
AWS Summit Berlin 2013 - Amazon Redshift
 

AWS Summit Berlin 2013 - Amazon Redshift

on

  • 1,459 views

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into ...

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. In this session we'll give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.

Statistics

Views

Total Views
1,459
Views on SlideShare
1,459
Embed Views
0

Actions

Likes
0
Downloads
59
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AWS Summit Berlin 2013 - Amazon Redshift AWS Summit Berlin 2013 - Amazon Redshift Presentation Transcript

    • Steffen KrauseAmazon RedshiftTechnology Evangelist@AWS_Aktuellskrause@amazon.de
    • Data warehousing done the AWS way• No upfront costs, pay as you go• Really fast performance at a really low price• Open and flexible with support for popular tools• Easy to provision and scale up massively
    • We set out to build…A fast and powerful, petabyte-scale data warehouse that is:Delivered as a managed serviceA Lot FasterA Lot CheaperA Lot SimplerAmazon Redshift
    • Amazon Redshift dramatically reduces I/OID Age State123 20 CA345 25 WA678 40 FLRow storage Column storageScanDirection
    • Amazon Redshift uses SQL• Industry standard SQL• ODBC and JDBC driver to access data (using Postgres 8.x drivers)– Most PostgreSQL features supported– See documentation for differences• INSERT/UPDATE/DELETE are supported– But loading data using COPY from S3 or DynamoDB is significantly faster– Use VACUUM after significant number of deletes or update
    • Amazon Redshift loads data from S3 or DynamoDB• Direct load from S3 or DynamoDB is supportedcopy customer from s3://mybucket/customer.txt’credentials aws_access_key_id=<your-access-key-id>;aws_secret_access_key=<your-secret-access-key>’gzip delimiter |’;• Load Data in Parallel– Split data into multiple files for parallel load– Name files with a common prefix• customer.txt.1, customer.txt.2, …– Compress large datasets with gzip• Load data in sort key order if possible
    • Amazon Redshift automatically compresses your data• Compress saves space and reduces disk I/O• COPY automatically analyzes and compressesyour data– Samples data; selects best compression encoding– Supports: byte dictionary, delta, mostly n, runlength, text• Customers see 4-8x space savings with real data– 20x and higher possible based on data set• ANALYZE COMPRESSION to see detailsanalyze compression listing;Table | Column | Encoding---------+----------------+----------listing | listid | deltalisting | sellerid | delta32klisting | eventid | delta32klisting | dateid | bytedictlisting | numtickets | bytedictlisting | priceperticket | delta32klisting | totalprice | mostly32listing | listtime | raw
    • Amazon Redshift architecture• Leader Node– SQL endpoint– Stores metadata– Coordinates query execution• Compute Nodes– Local, columnar storage– Execute queries in parallel– Load, backup, restore via Amazon S3– Parallel load from Amazon DynamoDB• Single node version available10 GigE(HPC)IngestionBackupRestoreJDBC/ODBCjdbc:postgresql://mycluster.c7lp0qs37f41.us-east-1.redshift.amazonaws.com:8192/mydb
    • Amazon Redshift runs on optimized hardwareHS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rateHS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage• Optimized for I/O intensive workloads• High disk density• Runs in HPC - fast network• HS1.8XL available on Amazon EC2
    • Amazon Redshift parallelizes and distributes everything• Query• Load• Backup• Restore• Resize10 GigE(HPC)IngestionBackupRestoreJDBC/ODBC
    • Amazon Redshift lets you start small and grow bigExtra Large Node (HS1.XL)3 spindles, 2 TB, 16 GB RAM, 2 coresSingle Node (2 TB)Cluster 2-32 Nodes (4 TB – 64 TB)Eight Extra Large Node (HS1.8XL)24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigECluster 2-100 Nodes (32 TB – 1.6 PB)Note: Nodes not to scale
    • Amazon Redshift is priced to let you analyze all your dataPrice Per Hour for HS1.XLSingle NodeEffective Hourly Price PerTBEffective Annual Priceper TBOn-Demand $ 0.850 $ 0.425 $ 3,7231 Year Reservation $ 0.500 $ 0.250 $ 2,1903 Year Reservation $ 0.228 $ 0.114 $ 999Simple PricingNumber of Nodes x Cost per HourNo charge for Leader NodeNo upfront costsPay as you go
    • Amazon Redshift is easy to use• Provision in minutes• Monitor query performance• Point and click resize• Built in security• Automatic backups
    • Provision a data warehouse in minutes
    • Provision a data warehouse in minutes
    • Provision a data warehouse in minutes
    • Provision a data warehouse in minutes
    • Monitor query performance
    • Monitor query performance
    • Point and click resize
    • Resize your cluster while remaining online• New target provisioned in the background• Only charged for source cluster
    • Resize your cluster while remaining online• Fully automated– Data automatically redistributed• Read only mode during resize• Parallel node-to-node data copy• Automatic DNS-based endpoint cutover• Only charged for one cluster
    • Amazon Redshift has security built-in• SSL to secure data in transit• Encryption to secure data at rest– AES-256; hardware accelerated– All blocks on disks and in Amazon S3encrypted• No direct access to compute nodes• Amazon VPC support10 GigE(HPC)IngestionBackupRestoreCustomer VPCInternalVPCJDBC/ODBC
    • Amazon Redshift continuously backs up your data andrecovers from failures• Replication within the cluster and backup to Amazon S3 to maintain multiple copies ofdata at all times• Backups to Amazon S3 are continuous, automatic, and incremental– Designed for eleven nines of durability• Continuous monitoring and automated recovery from failures of drives and nodes• Able to restore snapshots to any Availability Zone within a region
    • Amazon Redshift integrates with multiple data sourcesAmazonDynamoDBAmazon ElasticMapReduceAmazon SimpleStorage Service (S3)Amazon ElasticCompute Cloud (EC2)AWS StorageGateway ServiceCorporateData CenterAmazon RelationalDatabase Service(RDS)AmazonRedshiftMore coming soon…
    • Amazon Redshift provides multiple data loading options• Upload to Amazon S3• AWS Import/Export• AWS Direct Connect• Work with a partnerData Integration Systems IntegratorsMore coming soon…
    • Amazon Redshift works with your existing analysis toolsJDBC/ODBCAmazon RedshiftMore coming soon…
    • SkillPagesCustomer Use CaseEveryone NeedsSkilled PeopleAt HomeAt WorkIn LifeRepeatedly
    • Data ArchitectureData AnalystRaw DataGet DataJoin via FacebookAdd a Skill PageInvite FriendsWeb Servers Amazon S3User Action Trace EventsEMRHive Scripts Process Content• Process log files with regularexpressions to parse out theinfo we need.• Processes cookies into usefulsearchable data such asSession, UserId, API Securitytoken.• Filters surplus info like internalvarnish logging.Amazon S3Aggregated DataRaw EventsInternal WebExcel TableauAmazon Redshift
    • Resources & Questions• Steffen Krause | skrause@amazon.de | @AWS_Aktuell• http://aws.amazon.com/redshift• Getting Started Guide: http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html• Setting Up SQL Workbench/J:http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-using-workbench.html• SQL Reference:http://docs.aws.amazon.com/redshift/latest/dg/cm_chap_SQLCommandRef.html• Client Tools:• https://aws.amazon.com/marketplace/redshift/• https://www.jaspersoft.com/webinar-AWS-Agile-Reporting-and-Analytics-in-the-Cloud
    • Thank you!