Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS Online Tech Talks

1,150 views

Published on

Learning Objectives:
- Define Amazon S3 Select and Amazon Glacier Select
- Understand the scenarios in which these features can help you increase performance and extend your data lake
- See a before & after scenario of a query with and without Amazon S3 Select

  • Be the first to comment

Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS Online Tech Talks

  1. 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rahul Bhartia, Principal Product Manager, S3 March 29th 2018 Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select
  2. 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session 1.Introduction 2.Use-cases 3.Key features
  3. 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unmatched durability, availability, and scalability Best security, compliance, and audit capability Object-level control at any scale Business insight into your data Twice as many partner integrationsMost ways to bring data in Building a data lake with Amazon S3
  4. 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. With a wide variety of in-place tools… Amazon Athena Amazon Redshift Spectrum Amazon EMR AWS Glue Amazon S3
  5. 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: Compute scales based… on object size instead of the amount of data you want to process DATA COMPUTE
  6. 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: All of these tools… retrieve a lot of data they don’t need and do the heavy lifting
  7. 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You have a choice of storage classes Archive data Infrequently accessed data Minutes to hours Milliseconds 0.4¢-GB/mo. 1.25¢-GB/mo. Amazon S3 Standard Amazon S3 Standard– Infrequent Access Amazon Glacier Active data Milliseconds From 2.1¢-GB/mo.
  8. 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: You need to…. entire object from Amazon Glacier to Amazon S3 and then use it. Amazon S3Amazon Glacier
  9. 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing… Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  10. 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple, faster, and cheaper! Available as an API—no infrastructure or administration Faster performance as compared to doing it yourself Pay as you go. The less you retrieve the more you save.
  11. 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select
  12. 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Simple to use Standard SQL expression Familiar Work and scales like GET requests Integrated AWS SDK and Presto (others coming soon) Select contents from object instead of retrieving the object
  13. 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Output Format: delimited text (CSV, TSV), JSON … Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Math Boolean String (Like, ||) Aggregate Input Format: delimited text (CSV, TSV), JSON … Compression: GZIP …
  14. 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Simple pattern matches …get-object …object… | awk -F ’{ if($4=="x") print $1}’ ...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
  15. 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Serverless applications Amazon S3 AWS Lambda Amazon SNS S3 Select
  16. 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. After 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. Amazon S3 Select: Serverless MapReduce Before 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. 2X Faster at 1/5 of the cost
  17. 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Up to 400% Faster Up to 80% Cheaper Amazon S3 Select: Accelerating Big Data Amazon S3 Before: Amazon S3 S3 Select After:
  18. 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO: Amazon S3 Select with Presto Works with your existing Hive Metastore Automatically converts predicates into S3 Select requests Amazon S3 S3 Select
  19. 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Before Amazon S3 Select: Accelerating big data After After 5X Faster with 1/40 of the CPU
  20. 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Will be supported by… Amazon Athena Amazon EMR Amazon Redshift Spectrum
  21. 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: In Preview • Formats: CSV, JSON • Compression: GZIP • Encryption: None • Encoding: UTF-8 • Integration: AWS SDK for Java and Python and Presto Connector • Availability: Northern Virginia, Ohio, Oregon, Dublin, and Singapore Apply at: https://pages.awscloud.com/amazon-s3-select-preview.html
  22. 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select
  23. 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select Simple SQL Expression SELECT and WHERE Familiar semantics Work and scales like RESTORE requests Integrated AWS SDK and CLI Restore selective contents instead of restoring entire object
  24. 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select Input Format: delimited text (CSV, TSV, PSV, etc.) Encryption: SSE-KMS, SSE-S3 Output Format: delimited text (CSV, TSV, PSV, etc.) Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Boolean String (Like, ||)
  25. 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Two ways to use Amazon Glacier Select Using Glacier API Data directly uploaded to Amazon Glacier How to use Amazon Glacier Select? Using S3 API For data that is lifecycled to Amazon Glacier from S3
  26. 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How to use Amazon Glacier Select? Object Tier SQL query Output S3 location SNS topic Current restore-object API arguments New (optional) restore-object API arguments to use Glacier Select
  27. 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Glacier Select …restore-object …object… | …get-object … object …. | awk -F ’{ if($4==“id") print $1}’ ...restore-object …object… ‘SELECT o._1 WHERE o._4 == “id”…’ | …get-object … object ….
  28. 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select: GA • Formats: CSV, Any delimiter separated file • Encryption: SSE- KMS, SSE-S3 • Encoding: UTF-8 • Integration: AWS SDK, CLI, Athena integration (expected 2018) • Availability: All commercial regions where Amazon Glacier is launched
  29. 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary
  30. 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  31. 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Now: Your data lake on AWS Simple Faster Cheaper Amazon Glacier Amazon S3 Amazon Redshift Spectrum Amazon Athena Amazon EMR AWS Lambda ISVs and Custom Applications SELECT
  32. 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you

×