Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

2,719 views

Published on

In this session, you learn about the latest and hottest features of Amazon Redshift. Join Vidhya Srinivasan, General Manager of Amazon Redshift, to take a deep dive into the architecture and inner workings of Amazon Redshift. You discover how the recent availability, performance, and manageability improvements we’ve made can significantly enhance your end user experience. You also get a glimpse of what we are working on and our plans for the future.

Published in: Technology

AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Vidhya Srinivasan, General Manager, AWS November 30th, 2016 What’s New with Amazon Redshift “It’s our biggest driver of growth in our biggest markets, and is a feature of the company” …on Data Mining in Redshift – Chris Lambert, Lyft CTO “The doors were blown wide open to create custom dashboards for anyone to instantly go in and see and assess what is going in our ad delivery landscape, something we have never been able to do until now.” – Bryan Blair, Vevo’s VP of Ad Operations “Analytical queries are 10 times faster in Amazon Redshift than they were with our previous data warehouse.” – Yuki Moritani, NTT Docomo Innovation Manager “We have several petabytes of data and use a massive Redshift cluster. Our data science team can get to the data faster and then analyze that data to find new ways to reduce costs, market products, and enable new business.” – Yuki Moritani, NTT Docomo Innovation Manager “We saw a 2x performance improvement on a wide variety of workloads. The more complex the queries, the higher the performance improvement..” - Naeem Ali, Director of Software Development, Data Science at Cablevision (Optimum) “Over the last few years, we’ve tried all kinds of databases in search of more speed, including $15k of custom hardware. Of everything we’ve tried, Amazon Redshift won out each time.” – Periscope Data, Analyst’s Guide to Redshift “We took Amazon Redshift for a test run the moment it was released. It’s fast. It’s easy. Did I mention it’s ridiculously fast? We’re using it to provide our analysts an alternative to Hadoop.” – Justin Yan, Data Scientist at Yelp “The move to Redshift also significantly improved dashboard query performance… Redshift performed ~200% faster than the traditional SQL Server we had been using in the past.” - Dean Donovan, Product Development at DiamondStream “…[Redshift] performance has blown away everyone here (we generally see 50-100x speedup over Hive)” - Jie Li Data Infrastructure at Pinterest “450,000 online queries 98 percent faster than previous traditional data center, while reducing infrastructure costs by 80 percent.” - John O’Donovan, CTO, Financial Times “We needed to load six months' worth of data, about 10 TB of data, for a campaign. That type of load would have taken about 20 days with our previous solution. By using Amazon Redshift, it only took six hours to load the data.” - Zhong Hong, VP of Infrastructure, Vivaki (Publicis Groupe) “We regularly process multibillion row datasets and we do that in a matter of hours. We are heading to up to 10 times more data volumes in the next couple of years, easily.” - Bob Harris, CTO, Channel 4 “On our previous big data warehouse system, it took around 45 minutes to run a query against a year of data, but that number went down to just 25 seconds using Amazon Redshift” - Kishore Raja Director of Strategic Programs and R&D, Boingo Wireless “Most competing data warehousing solutions would have cost us up to $1 million a year. By contrast, Amazon Redshift costs us just $100,000 all-in, representing a total cost savings of around 90%” - Joel Cumming, Head of Data, Kik Interactive “Annual costs of Redshift are equivalent to just the annual maintenance of some of the cheaper on-premises options for data warehouses..” - Kevin Diamond, CTO, HauteLook (Nordstrom) “Our data volume keeps growing, and we can support that growth because Amazon Redshift scales so well.. We wouldn’t have that capability using the supporting on-premises hardware in our previous solution.” - Ajit Zadgaonkar, Director of Ops. and Infrastructure, Edmunds “With Amazon Redshift and Tableau, anyone in the company can set up any queries they like - from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts had in different areas” - Jon Hoffman, Head of Engineering, Foursquare BDA304
  2. 2. AWS Big Data Portfolio Collect Store Analyze Amazon Kinesis Firehose AWS Direct Connect Amazon Snowball Amazon Kinesis Analytics Amazon Kinesis Streams Amazon S3 Amazon Glacier Amazon CloudSearch Amazon RDS, Amazon Aurora Amazon Dynamo DB Amazon Elasticsearch Service Amazon EMR Amazon EC2 Amazon Redshift Amazon Machine Learning Amazon QuickSight AWS Data PipelineAWS Database Migration Service
  3. 3. Legacy architectural models lead to dark data 0 200 400 600 800 1000 1200 Enterprise Data Data in Warehouse Very Expensive complicated Slow at scale
  4. 4. Amazon Redshift shift Fast, simple, petabyte-scale data warehousing for $1,000/TB/Year
  5. 5. Amazon Redshift is fast “Did I mention that it’s ridiculously fast? We’re using it to provide our analysts with an alternative to Hadoop” “After investigating Redshift, Snowflake, and BigQuery, we found that Redshift offers top-of-the- line performance at best-in-market price points” “…[Redshift] performance has blown away everyone here. We generally see 50-100X speedup over Hive” “We regularly process multibillion row datasets and we do that in a matter of hours. We are heading to up to 10 times more data volumes in the next couple of years, easily” “We saw a 2X performance improvement on a wide variety of workloads. The more complex the queries, the higher the performance improvement” “On our previous big data warehouse system, it took around 45 minutes to run a query against a year of data, but that number went down to just 25 seconds using Amazon Redshift”
  6. 6. Amazon Redshift is easy to use “With Amazon Redshift and Tableau, anyone in the company can set up any queries they like - from how users are reacting to a feature, to growth by demographic or geography, to the impact sales efforts had in different areas” “The doors were blown wide open to create custom dashboards for anyone to instantly go in and see and assess what is going in our ad delivery landscape, something we have never been able to do until now.” Provides an easy-to-use mechanism for querying data with quick and uniform response times that analysts can use to run research projects and perform in-depth analysis…We don’t have to pre-allocate resources and can easily scale up to meet demand and then scale down for efficiency”
  7. 7. Amazon Redshift is inexpensive “450,000 online queries 98 percent faster than previous traditional data center, while reducing infrastructure costs by 80 percent.” “Annual costs of Redshift are equivalent to just the annual maintenance of some of the cheaper on-premises options for data warehouses..” “Most competing data warehousing solutions would have cost us up to $1 million a year. By contrast, Amazon Redshift costs us just $100,000 all-in, representing a total cost savings of around 90%”
  8. 8. Selected Amazon Redshift customers
  9. 9. Traditional Data Warehousing Business Reporting Complex pipelines and queries Secure and Compliant Simple Migration – Point & Click using AWS Database Migration Service Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant Large Ecosystem – Variety of cloud and on-premises BI and ETL tools Japanese Mobile Phone Provider World’s Largest Children’s Book Publisher Powering 100 marketplaces in 50 countries Bulk Loads and Updates
  10. 10. Log Analysis Log & Machine IOT Data Clickstream Events Data Time-Series Data Inexpensive – Analyze large volumes of data cost-effectively Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics Interactive data analysis and recommendation engine Ride analytics for pricing and product development Ad prediction and on-demand analytics
  11. 11. Business Applications Multi-Tenant BI Applications Back-end services Analytics as a Service Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can focus on your business applications Simple Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several data marts Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline Infosys Information Platform (IIP) Analytics-as-a- Service Product and Consumer Analytics
  12. 12. Thousands of companies run mission-critical workloads on Amazon Redshift
  13. 13. Redshift is used for mission-critical workloads Financial and management reporting Payments to suppliers and billing workflows Web/Mobile clickstream and event analysis Recommendation and predictive analytics
  14. 14. Amazon Redshift is available everywhere AWS is Dublin Frankfurt London Seoul Sydney Tokyo Singapore Beijing Mumbai Sao Paulo US East - Virginia US West - Oregon US West – Northern California GovCloud Columbus Ohio Montreal Currently Available Coming soon
  15. 15. Redshift is fast and has gotten faster... 5X Query throughput improvement this year  Memory allocation (launched)  Improved commit and I/O logic (launched)  Queue hopping (launched)  Query monitoring rules (coming soon)  Power start (coming soon)  Short query bias (coming soon) 10X Vacuuming performance improvement  Ensures data is sorted for efficient and fast I/O  Reclaims space from deleted rows  Enhanced vacuum performance leads to better system throughput Fast Efficient
  16. 16. Amazon Redshift Cluster The life of a query BI tools SQL clients Analytics tools Client Leader node Compute node Compute node Compute node Queue 1 Queue 2 1 4 32
  17. 17. Amazon Redshift Workload Management Waiting Coming soon: Short query bias BI tools SQL clients Analytics tools Client Running Queue 1 Queue 2 4 Slots 2 Slots Short queries go to the head of the queue 1 1
  18. 18. Amazon Redshift Cluster Coming soon: Power start BI tools SQL clients Analytics tools Client Leader node Compute node Compute node Compute node All queries receive a power start. Shorter queries benefit the most 2 2 2 2
  19. 19. Amazon Redshift Workload Management Waiting Queue hopping BI tools SQL clients Analytics tools Client Running Queue 1 Queue 2 4 Slots 2 Slots Release slot if a query exceeds timeout 3 3
  20. 20. Coming soon: Query monitoring rules • Allows automatic handling of runaway (poorly written) queries • Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate. • Multiple predicates can be AND-ed together to create a rule. • Multiple rules can be defined for a queue in WLM. These rules are OR-ed together. If { rule } then [action] { rule : metric operator value } eg: rows_scanned > 100000 • Metric : cpu_time, query_blocks_read, rows scanned, query execution time, cpu & io skew per slice, join_row_count, etc. • Operator : <, >, == • Value : integer [action] : hop, log, abort
  21. 21. Coming soon: Query monitoring rules Monitor and control cluster resources consumed by a query Get notified, abort and reprioritize long- running / bad queries Pre-defined templates for common use cases
  22. 22. Amazon Redshift is easy to use Provisioning in minutes Automatic Patching SQL - Data loading Backups built-in Security is built-in Compression is built-in
  23. 23. And is getting easier… Automatic and incremental background VACUUM (coming soon)  Reclaims space and sorts when Redshift clusters are idle  VACUUM is initiated when performance can be enhanced  Improves ETL and query performance Automatic data compression for CTAS  CREATE TABLE AS (CTAS) command creates a new table  The new table leverages compression automatically Automatic compression for new tables (coming soon)  All newly created tables will leverage default encoding  Provides higher compression rates 010101010101
  24. 24. And is getting easier… Enhanced localization  Support for Timestamp with Time zone : New TIMESTAMPTZ data type to input complete timestamp values that include the date, the time of day, and a time zone. Eg: 30 Nov 07:37:16 2016 PST  Support for Multi-byte (UTF-8) characters for tables, columns, and other database object names Resource management  Connection limits: You can now set a limit on the number of database connections a user is permitted to have open concurrently  Query Monitoring rules (coming soon) Table level restore  Restore tables that you might have dropped accidentally  Reconcile data from an older table that you might have updated or deleted unintentionally
  25. 25. Coming soon: Ingestion improvements Faster migrations  Schema Conversion Tool  Already supports schema conversion from Oracle, Teradata, Netezza and Greenplum  Coming soon: Support for Vertica and SQL Server  AWS Database Migration Service  Supports data migration with CDC from Oracle, SQL Server, PostgreSQL, MySQL and Aurora  Will add more DW engines as sources Data Loading  Enhanced JSON & AVRO ingestion performance
  26. 26. Durability and availability Automated Backups Cross-region backups Cluster-level mirroring Streaming restore Monitoring Ticketing
  27. 27. Compute node Durability 1. Multiple copies of a block within the cluster 2. Continuous and incremental backups to Amazon S3 (automatic) 3. Continuous and incremental backups across regions (checkbox enable) 4. Streaming restore – database is available for query once metadata restored. Data streams down in background or when queried. Enables cross-AZ restores. Amazon S3 Region 1 Region 2 Compute node Compute node Amazon S3
  28. 28. 1. Disk failures 2. Availability Zone failures 3. Region level failures 4. Node failures Fault tolerance Compute node Amazon S3 Region 1 Region 2 Compute node Compute node Amazon S3
  29. 29. Node fault tolerance Leader nodeClient Data-path monitoring agents Compute node Compute node Compute node Node level monitoring can detect SW/HW issues and take action
  30. 30. Node fault tolerance Leader nodeClient Data-path monitoring agents Compute node Compute node Compute node Failure is detected at one of the compute nodes
  31. 31. Node fault tolerance Leader nodeClient Data-path monitoring agents Compute node Compute node Compute node Redshift parks the connections Next, the node is replaced
  32. 32. Node fault tolerance Leader nodeClient Data-path monitoring agents Compute node Compute node Compute node Queries are re-submitted (coming soon)
  33. 33. Node fault tolerance Leader nodeClient Data-path monitoring agents Compute node Compute node Compute node Cluster-level monitoring agents Additional monitoring layer for the leader node and network
  34. 34. Amazon Redshift is secure End-to-End data encryption Alerts and notifications Amazon VPC AWS KMS and HSM Audit logging Certifications & Compliance
  35. 35. Leader node Compute node 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC BI tools SQL clientsAnalytics tools Compute node Compute node JDBC/ODBC  Load encrypted from S3  SSL to secure data in transit  ECDHE perfect forward secrecy  Amazon VPC for network isolation  Encryption to secure data at rest  All blocks on disks and in S3 encrypted  Block key, cluster key, master key (AES-256)  On-premises HSM & AWS CloudHSM support  Audit logging and AWS CloudTrail integration  SOC 1/2/3, PCI-DSS, FedRAMP, BAA Amazon Redshift is secure Amazon S3 Amazon EMR Amazon Dynamo DB SSH
  36. 36. We have added even more security features... IAM support for data LOAD/UNLOAD  Use IAM roles to perform LOAD/UNLOAD operations  Cluster is given IAM credentials  Cluster can be granted access to specific S3 buckets  Simplify credentials management  More secure Enhanced VPC Routing  All query traffic only flows through customer VPC  Strict data traffic management between Amazon Redshift and other data sources Secure Simple
  37. 37. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Directory IAM User groups Individual user Coming soon: IAM Authentication for DB users Identity providers IdP registers with IAM 1 1
  38. 38. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Directory IAM User groups Individual user Single Sign-On Coming soon: IAM Authentication for DB users Identity providers End-users can leverage SSO 2 2
  39. 39. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Active Directory IAM Amazon Redshift ODBC/JDBC User groups Individual user Single Sign-On Coming soon: IAM Authentication for DB users Identity providers New Redshift ODBC/JDBC drivers. Grab the ticket (userid) and get a SAML assertion 3 3
  40. 40. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Active Directory IAM Amazon Redshift ODBC/JDBC User groups Individual user Coming soon: IAM Authentication for DB users Identity providers Send SAML assertion to IAM. Obtain IAM role credentials 4 4
  41. 41. BI tools SQL clientsAnalytics tools Client AWS Redshift ADFS Corporate Active Directory IAM Amazon Redshift ODBC/JDBC User groups Individual user Coming soon: IAM Authentication for DB users Identity providers Use IAM role to obtain temporary cluster credentials. Use temporary credentials to log into the database 5 5
  42. 42. Amazon Redshift shift Contact us at redshift-pm@amazon.com Available now  Queue hopping  10X VACUUM performance improvement  Node fault tolerance  Enhanced VPC routing  IAM support for Copy/Unload  Auto compression for CTAS  TimestampTZ datatype Coming soon  Query Monitoring rules  Automatic and incremental background VACUUM  Short query bias  Power start  IAM Authentication for DB users  Auto compression for new tables  Enhanced JSON & AVRO ingestion performance
  43. 43. Thank you!
  44. 44. Remember to complete your evaluations!

×