Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

July 2017 Meeting of the Denver AWS Users' Group

131 views

Published on

July 2017 Meeting slides on Amazon Redshift.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

July 2017 Meeting of the Denver AWS Users' Group

  1. 1. AWS Users’ Group Updates David “Mac” McDaniel Sr. Solution & Cloud Architect - Independent Consultant david@mobile-360.com LinkedIn: https://www.linkedin.com/in/davidbmcdaniel Twitter: @CloudKegGuy, @ServerlessJava Twitter list: https://twitter.com/CloudKegGuy/lists/aws/members
  2. 2. Getting Connected Slack Channel: https://DenverAWSUsersGroup.slack.com You will need an invitation to join, please email me: david@mobile-360.com. We are now listed on AWS UG site: https://aws.amazon.com/usergroups/americas/ We are sponsored by CloudAcademy! They have a free portal for our members at: https://cloudacademy.com/aws-usergroup/?code=newawsugs We are also sponsored and a member of the official Global AWS Communities! See them at https://awsug.support
  3. 3. What we’re going to do tonight 1. Describe Amazon Redshift 2. Talk about how it’s different from regular SQL Databases 3. Talk about storage options for Redshift a. Standard Disk-based storage b. Spectrum and S3 (CSV & Parquet) storage 4. Describe ways to load data a. S3, EMR, DynamoDB or Remote Hosts 5. Compare to Athena
  4. 4. What is Redshift? Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds. With Amazon Redshift, you can start small for just $0.25 per hour with no commitments and scale out to petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions. Amazon Redshift also includes Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3. No loading or transformation is required, and you can use open data formats, including CSV, TSV, Parquet, Sequence, and RCFile. Redshift Spectrum automatically scales query compute capacity based on the data being retrieved, so queries against Amazon S3 run fast, regardless of dataset size. Recently announced 4x compression improvement in Redshift.
  5. 5. How Redshift is Different Redshift is a column-oriented database whereas regular SQL databases are row-oriented in nature. This means that Redshift stores groups of columns together rather than groups of rows. This can be hugely beneficial when processing many rows, but only a few columns, which is typical in BI and Analytical processing. Many data warehouse databases will be denormalized to reduce joins and therefore tables will be very wide (many columns) to provide the most value, even though individual queries will only use a small number of columns.
  6. 6. Columnar vs. Row Oriented
  7. 7. Storage Options 1. Local Disk Storage a. Traditional, SSD-based, ties storage to compute. b. Ties compute to storage. c. Must make FULL read-only copies to scale. 2. S3 - Used with Redshift Spectrum a. Uses Amazon Athena Meta-data to understand files in S3. b. Decouples storage from compute. c. Still must make read-only copies, but of meta-data only, so smaller & faster to scale.
  8. 8. How do we load data? Multiple ways: 1. Preferred way: Use COPY command to load data from files in one of many formats from: a. S3 b. EMR c. Remote EC2 Hosts d. DynamoDB Tables 2. Use DML:
  9. 9. How is it different from Athena? Athena Redshift Storage on S3 Storage on attached SSD disks Automatically scales Must add more instances/change instance size Massive parallelism Only as parallel as you configure Data can be stored in multiple formats per table Data can be loaded from files in multiple formats
  10. 10. Demo! 1. Create Schemas for Redshift tables 2. Load data in multiple formats from S3 3. Create Redshift Spectrum Schemas 4. Load data (really, meta-data) 5. Execute queries 6. Tableau visualization
  11. 11. Next Month’s TOPIC: ???? We need speakers! Chipe: Cheesy, Sarcastic
  12. 12. Questions?

×