[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
Upcoming SlideShare
Loading in...5
×
 

[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

on

  • 1,374 views

2014/02/19 東京開催セミナー資料

2014/02/19 東京開催セミナー資料

Statistics

Views

Total Views
1,374
Views on SlideShare
1,374
Embed Views
0

Actions

Likes
1
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介 [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介 Presentation Transcript

  • AWSプロダクトシリーズ よくわかるAmazon Redshift 2014/02/19 アマゾン データ サービス ジャパン株式会社
  • Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year Rahul Pathak |Senior Product Manager
  • a lot faster a lot cheaper a whole lot simpler Petabyte scale Massively parallel Amazon Redshift Relational data warehouse Fully managed; zero admin
  • Amazon Redshift Quick Overview Amazon Redshift 概要のおさらい
  • Amazon Redshift architecture • Leader Node – – – • SQL endpoint Stores metadata Coordinates query execution JDBC/ODBC Compute Nodes – – – – Local, columnar storage Execute queries in parallel Load, backup, restore via Amazon S3 Parallel load from Amazon DynamoDB • Hardware optimized for data processing • Two hardware platforms 10 GigE (HPC) – – DW1: HDD; scale from 2TB to 1.6PB DW2: SSD; scale from 160GB to 256TB Ingestion Backup Restore
  • Amazon Redshift has security built-in • • Customer VPC SSL to secure data in transit Encryption to secure data at rest – – – JDBC/ODBC AES-256; hardware accelerated All blocks on disks and in Amazon S3 encrypted HSM Support • 10 GigE (HPC) No direct access to compute nodes • Internal VPC Audit logging & AWS CloudTrail integration • Amazon VPC support Ingestion Backup Restore
  • Amazon Redshift is easy to use • Provision in minutes • Monitor query performance • Point and click resize • Built in security • Automatic backups
  • Provision a data warehouse in minutes
  • Monitor query performance
  • Point and click resize • Resize while remaining online via AWS Console or API • Provision a new cluster in the background and copy data in parallel from node to node • Only charged for source cluster until SQL endpoint has automatically been switched over via DNS
  • Amazon Redshift continuously backs up your data and recovers from failures • Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times • Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability • Continuous monitoring and automated recovery from failures of drives and nodes • Able to restore snapshots to any Availability Zone within a region • Easily enable backups to a second region for disaster recovery
  • Amazon Redshift integrates with multiple data sources Corporate Datacenter DynamoDB Amazon Redshift Amazon S3 Amazon RDS Amazon EMR
  • New Features That Introduced After re:Invent 2013 re:Invent 2013以降の主なアップデート
  • Feature Delivery in 2013 Unload logs (7/5) Temp Credentials (4/11) Sharing snapshots (7/18) DUB (4/25) Resource Level IAM (8/9) SHA1 Builtin (7/15) SOC1/2/3 (5/8) Statement Timeout (7/22) WLM Timeout/Wildcards (8/1) UTF-8 Substitution (8/29) JDBC Fetch Size (6/27) Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS Alerts (11/13) Service Launch (2/14) Split_part, Audit tables (10/3) EIP Support for VPC Clusters (12/28) PCI (8/22) SIN/SYD (10/8) PDX (4/2) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13) JSON, Regex, Cursors (9/10) NRT (6/5) CRC32 Builtin, CSV, Restore Progress (8/9) Timezone, Epoch, Autoformat (7/25) 4 byte UTF-8 (7/18) Unload Encrypted Files HSM Support (11/11)
  • Summary of Updates after re:Invent • Amazon Redshift - New Features Galore (2013/11/11) – – – – – – – – • • • Distributed Tables - You now have more control over the distribution of a table's rows across compute nodes. Remote Loading - You can now load data into Redshift from remote hosts across an SSH connection. Approximate Count Distinct - You can now use a variant of the COUNT function to approximate the number of matching rows. Workload Queue Memory Management - You can now apportion available memory across work queues. Key Rotation - You can now direct Redshift to rotate keys for an encrypted cluster. HSM Support - You can now direct Redshift to use an on-premises Hardware Security Module (HSM) or AWS CloudHSM to manage the encryption master and cluster encryption keys. Database Auditing and Logging - You can log connections and user activity to Amazon S3. SNS Notification - Redshift can now issue notifications to an Amazon SNS topic when certain events occur. Automated Cross-Region Snapshot Copy for Amazon Redshift (2013/11/14) Faster & More Cost-Effective SSD-Based Nodes for Amazon Redshift(2014/01/24) AWS CloudFormation Adds Support for Redshift and More (2014/02/10)
  • Amazon Redshift Node Types DW1.XL: 16 GB RAM, 2 Cores 3 Spindles, 2 TB compressed storage • Optimized for I/O intensive workloads • High disk density DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles 16 TB compressed, 2 GB/sec scan rate • On demand at $0.85/hour • As low as $1,000/TB/Year • Scale from 2TB to 1.6PB DW2.L *New*: 16 GB RAM, 2 Cores, 160 GB compressed SSD storage • High performance at smaller storage size • High compute and memory density • On demand at $0.25/hour • As low as $5,500/TB/Year • Scale from 160GB to 256TB DW2.8XL *New*: 256 GB RAM, 32 Cores, 2.56 TB of compressed SSD storage
  • Amazon Redshift is priced to let you analyze all your data Price Per Hour for DW1.XL Single Node Effective Annual Price per TB On-Demand $ 1.250 $ 5,475 1 Year Reservation $ 0.750 $ 3,283 3 Year Reservation $ 0.452 $ 1,981 DW1 (HDD) Effective Annual Price per TB On-Demand $ 0.330 $ 18,068 1 Year Reservation $ 0.211 $ 11,570 3 Year Reservation $ 0.130 $ 7,127 No charge for leader node • Price Per Hour for DW2.L Single Node Number of nodes x cost per hour • DW2 (SSD) • No upfront costs • Pay as you go
  • Security, visibility and control • Audit logging Redshift • SNS Alerts
  • Visibility and control AWS CloudTrail System Activity Creates, Changes, Deletes, Resizes • Audit logging • SNS Alerts Amazon Redshift Database Activity Logins, Login failures, Queries, Loads Amazon S3
  • Visibility and control • • Audit logging Monitoring Security Maintenance Errors SNS Alerts Amazon Redshift SNS Topic
  • Batch operations • Cluster Creation • Faster Resize Amazon Corporate Amazon EC2 Data Center EMR Amazon Redshift Amazon S3
  • Batch operations • Cluster Creation • Faster Resize Amazon Corporate Amazon EC2 Data Center EMR Amazon Redshift Amazon S3
  • Batch operations • Cluster Creation • Faster Resize 15-20 min 3 min
  • Batch operations • Cluster Creation • Faster Resize 29 hours 7 hours
  • Performance & Concurrency
  • Performance & Concurrency 692.8s 34.9s < 2%
  • Performance & Concurrency 5,951.7s 2,151.9s
  • Performance & Concurrency 15 50
  • How Customers Leverage Amazon Redshift Amazon Redshift 活用事例
  • Common Customer Use Cases Traditional Enterprise DW SaaS Companies • Improve performance by an order of magnitude • Add analytic functionality to applications Make more data available for analysis • Scale DW capacity as demand grows • • • • • Reduce costs by extending DW rather than adding HW Companies with Big Data Access business data via standard reporting tools • Reduce HW & SW costs by an order of magnitude Migrate completely from existing DW systems Respond faster to business; provision in minutes
  • Amazon Redshift Customers
  • Japanese Redshift Customer – ALBERT • Business Challenge – • Why AWS? – – • Given their data volumes, RDBMS tuning and archiving was causing them a lot of operational pain and costing them money Amazon Redshift’s performance and ability to handle large data sets allowed them to make it the core engine of their analytics, enabling them to provide a private DMP (Data Management Platform) for their customers on the Cloud PostgreSQL is their primary RDBMS, and connectivity by PostgreSQL drivers is big technical advantage to choose Redshift. Benefits for their business – – Ability to start small and scale as needed Scalability and flexibility dramatically lowered the cost of ownership
  • Japanese Redshift Customer – Sansan • Business Challenge – Since “Eight” is business card management solution for consumers, they needed infrastructure that could start small and scale as needed • Why AWS? – When they tried out AWS first, they were surprised with the ease of use. AWS functionality and elasticity were critical factors • Benefits for their business – Lower costs substantially using reserved instances – Automation is a key to reduce operational and administration costs. They utilize services such as Amazon SES and Amazon SWF. – They use Redshift for KPI analytics of their services.
  • Growing ecosystem
  • Multiple Data Loading Options Data Integration • Parallel upload to Amazon S3 • AWS Direct Connect • AWS Import/Export • ETL Software • Systems integrators Systems Integrators
  • Customers on Performance “Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link …[Redshift] performance has blown away everyone here (we generally see 50-100x speedup over Hive). link “We saw…2x improvement in query times and a 50% reduction in costs” We regularly process multibillion row datasets and we do that in a matter of hours. link “Queries that used to take hours came back in seconds. Our analysts are orders of magnitude more productive.” (20x – 40x reduction in query times) link “Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an alternative to Hadoop.”
  • Customers on Cost “We found that Amazon Redshift offers the performance we needed while freeing us from the licensing costs of our previous solution” link “[Redshift] cost saving is even more impressive…Our analysts like [Redshift] so much they don’t want to go back.” (4x reduction in cost over HIVE) link “We saw 50% reduction in costs” “Not only did we avoid 3 months of development work [we] saved approximately $80,000 in labor…Competitive Advantage realized with just a few clicks.” “[Amazon Redshift] took an industry famous for its opaque pricing, high TCO and unreliable results and completely turned it on its head.” link “[Redshift] has reduced our storage and processing costs significantly, helping us to realize another 60-70 percent savings.” link
  • Customer on Ease of Use “With Amazon Redshift and Tableau, anyone in the company can set up any queries they like…It’s very flexible.” link “Compared to Hadoop [Redshift] is much easier for analysts to use. What may have been a Hadoop project can become just a query in Redshift.” link “We can spin up an Amazon Redshift cluster, take a snapshot, and scale servers in minutes instead of days.” link “…our team was able to provision Redshift in a matter hours vs. weeks with on-premises servers.” “Amazon Redshift is simple to use and reliable. With one click, we can rapidly scale up or down in real time in alignment with business requirements.” link “Customers can get consistent, accurate, and useful data fast - in weeks not months or years.” link
  • AWS Marketplace • Find software to use with Amazon Redshift • One-click deployments • Flexible pricing options http://aws.amazon.com/marketplace
  • Questions?
  • APPENDIX
  • Resources • Detail Pages – – • New Features – – • http://docs.aws.amazon.com/redshift/latest/dg/doc-history.html http://docs.aws.amazon.com/redshift/latest/mgmt/document-history.html Best Practices – – – • http://aws.amazon.com/redshift https://aws.amazon.com/marketplace/redshift/ http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html Presentations & Webinars: – – – http://www.youtube.com/watch?v=JxLpj_TnisM (2013 SF Summit Presentation) http://www.youtube.com/watch?v=R1m-fwzXMow (Best Practices 1 of 2) http://www.youtube.com/watch?v=7ySzRTOyK6o (Best Practices 2 of 2)
  • Amazon Redshift dramatically reduces I/O Column storage Data compression Age State Amount 20 CA 500 345 25 WA 250 678 • ID 123 • 40 FL 125 37 WA 375 • Zone maps 957 • Direct-attached storage • With row storage you do unnecessary I/O • To get total amount, you have to read everything
  • Amazon Redshift dramatically reduces I/O Column storage Data compression Age State Amount 20 CA 500 345 25 WA 250 678 • ID 123 • 40 FL 125 37 WA 375 • Zone maps 957 • Direct-attached storage • With column storage, you only read the data you need
  • Amazon Redshift dramatically reduces I/O • Column storage analyze compression listing; Table | Column | Encoding ---------+----------------+---------listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw • Data compression • Zone maps • Direct-attached storage • COPY compresses automatically • You can analyze and override • More performance, less cost Slides not intended for redistribution.
  • Amazon Redshift dramatically reduces I/O • Column storage 10 324 • Data compression 375 623 • Zone maps • Direct-attached storage 637 959 10 | 13 | 14 | 26 |… … | 100 | 245 | 324 375 | 393 | 417… … 512 | 549 | 623 637 | 712 | 809 … … | 834 | 921 | 959 • Track the minimum and maximum value for each block • Skip over blocks that don’t contain relevant data
  • Amazon Redshift dramatically reduces I/O • Column storage • Use local storage for performance • Maximize scan rates • Data compression • Zone maps • Automatic replication and continuous backup • Direct-attached storage • HDD & SSD platforms
  • Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize
  • Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize • Load in parallel from Amazon S3 or Amazon DynamoDB or any SSH connection • Data automatically distributed and sorted according to DDL • Scales linearly with number of nodes
  • Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Backups to Amazon S3 are automatic, continuous and incremental • Resize • Configurable system snapshot retention period. Take user snapshots on-demand • Cross region backups for disaster recovery • Streaming restores enable you to resume querying faster
  • Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • Resize • Resize while remaining online • Provision a new cluster in the background • Copy data in parallel from node to node • Only charged for source cluster
  • Amazon Redshift parallelizes and distributes everything • Query • Load • Backup/Restore • • Automatic SQL endpoint switchover via DNS • Decommission the source cluster • Simple operation via Console or API Resize