Successfully reported this slideshow.
Your SlideShare is downloading. ×

Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 40 Ad

Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017

Amazon S3 & Amazon Glacier provide the durable, scalable, secure and cost-effective storage you need for your data lake. But, as your data lake grows, the resources needed to analyze all the data can become expensive, or queries may take longer than desired. AWS provides query-in-place services like Amazon Athena and Amazon Redshift Spectrum to help you analyze this data easily and more cost-effectively than ever before. In this session, we will talk about how AWS query-in-place services and other tools work with Amazon S3 & Amazon Glacier and the optimizations you can use to analyze and process this data, cheaply and effectively.

Amazon S3 & Amazon Glacier provide the durable, scalable, secure and cost-effective storage you need for your data lake. But, as your data lake grows, the resources needed to analyze all the data can become expensive, or queries may take longer than desired. AWS provides query-in-place services like Amazon Athena and Amazon Redshift Spectrum to help you analyze this data easily and more cost-effectively than ever before. In this session, we will talk about how AWS query-in-place services and other tools work with Amazon S3 & Amazon Glacier and the optimizations you can use to analyze and process this data, cheaply and effectively.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017 (20)

Advertisement

More from Amazon Web Services (20)

Big Data Breakthroughs: Process and Query Data In Place with Amazon S3 Select & Amazon Glacier Select - STG313 - re:Invent 2017

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Breakthroughs Process and Query Data In Place with Amazon S3 & Glacier R a h u l B h a r t i a R a s h i m G u p t a P r i n c i p a l P r o d u c t M a n a g e r P r i n c i p a l P r o d u c t M a n a g e r A m a z o n S 3 A m a z o n G l a c i e r S T G 3 1 3 N o v e m b e r 2 9 , 2 0 1 7
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from the session 1. Introduction to Amazon S3 Select and Amazon Glacier Select 2. Dive into Use-cases 3. Key features and demos
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unmatched durability, availability, and scalability Best security, compliance, and audit capability Object-level control at any scale Business insight into your data Twice as many partner integrations Most ways to bring data in Building a data lake with Amazon S3
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Amazon Redshift Spectrum Amazon EMR AWS Glue With a wide variety of in-place tools… Amazon S3
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: Compute scales based… on object size instead of the amount of data you want to process DATA COMPUTE
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: All of these tools… retrieve a lot of data they don’t need and do the heavy lifting
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Today: You need to restore… entire object from Amazon Glacier to Amazon S3 and then use it. Amazon S3 Amazon Glacier
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introducing… Amazon S3 Select and Amazon Glacier Select Select subset of data from an object based on a SQL expression
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Available as an API—no infrastructure or administration Faster performance as compared to doing it yourself Pay as you go. The less you retrieve the more you save. Simple, faster, and cheaper!
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple to use Standard SQL expression Familiar Work and scales like GET requests Integrated AWS SDK and Presto (others coming soon) Amazon S3 Select Select contents from object instead of retrieving the object
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select Output Format: delimited text (CSV, TSV), JSON … Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Math Boolean String (Like, ||) Aggregate Input Format: delimited text (CSV, TSV), JSON … Compression: GZIP …
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Simple pattern matches …get-object …object… | awk -F ’{ if($4=="x") print $1}’ ...select-object …object… ‘SELECT o._1 WHERE o._4 == “x”…’
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Serverless applications Amazon S3 AWS Lambda Amazon SNS S3 Select
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 200 seconds and 11.2 cents # Download and process all keys for key in src_keys: response = s3_client.get_object(Bucket=src_bucket, Key=key) contents = response['Body'].read() for line in contents.split('n')[:-1]: line_count +=1 try: data = line.split(',') srcIp = data[0][:8] …. 95 seconds and costs 2.8 cents # Select IP Address and Keys for key in src_keys: response = s3_client.select_object_content (Bucket=src_bucket, Key=key, expression = SELECT SUBSTR(obj._1, 1, 8), obj._2 FROM s3object as obj) contents = response['Body'].read() for line in contents: line_count +=1 try: …. A f t e rB e f o r e Amazon S3 Select: Serverless MapReduce 2X Faster at 1/5 of the cost
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Up to 400% Faster Up to 80% Cheaper Amazon S3 Select: Accelerating Big Data Amazon S3 Before: Amazon S3 S3 Select After:
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO: Amazon S3 Select with Presto Works with your existing Hive Metastore Automatically converts predicates into S3 Select requests Amazon S3 S3 Select
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A f t e rB e f o r e Amazon S3 Select: Accelerating big data 5X Faster with 1/40 of the CPU
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Amazon EMR Amazon Redshift Spectrum Amazon S3 Select: Will be supported by…
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 Select: Preview starts today • Formats: CSV, JSON • Compression: GZIP • Encryption: None • Encoding: UTF-8 • Integration: AWS SDK for Java and Python and Presto Connector • Availability: Northern Virginia, Ohio, Oregon, Dublin, and Singapore Apply at: https://pages.awscloud.com/amazon-s3-select-preview.html
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier • Extremely low-cost archive storage service, starting at $0.004 GB/mo • Allows you to retrieve data from minutes to hours • 99.999999999% of durability (five to six orders of magnitude higher than two copies of tape) • All data is encrypted at rest • Features: compliance, data management, cost management, audit logging
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Choice of storage classes Active data Archive dataInfrequently accessed data Milliseconds Minutes to hoursMilliseconds From 2.1¢-GB/mo. 0.4¢-GB/mo.1.25¢-GB/mo. Amazon S3 Standard Amazon S3 Standard– Infrequent Access Amazon Glacier
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Retrievals using Amazon Glacier Expedited Standard Bulk Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests • Three flexible and powerful retrieval options to access any of your Amazon Glacier data
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple SQL Expression SELECT and WHERE Familiar semantics Work and scales like RESTORE requests Integrated AWS SDK and CLI Amazon Glacier Select Select relevant contents from object instead of restoring entire object
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Glacier API Data directly uploaded to Amazon Glacier Using S3 API For data that is lifecycled to Amazon Glacier from S3 How to use Amazon Glacier Select? Two ways to use Amazon Glacier Select
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Object id How to use Amazon Glacier Select? Current restore-object API arguments Tier SQL query New (optional) restore-object API arguments to use Glacier Select Output S3 location SNS topic
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Glacier Select …restore-object …object… | …get-object … object …. | awk -F ’{ if($4==“id") print $1}’ ...restore-object …object… ‘SELECT o._1 WHERE o._4 == “id”…’ | …get-object … object ….
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Glacier Select works App Amazon Glacier Amazon S3Glacier-Select (ArchiveId, SQL, Tier, S3 bucket to write output) 200 OK Read data and Perform filtering Write output to S3 Notify app using SNS, that output ready
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Billing Inquiries Auditing/Compliance Building Training Models Amazon Glacier Select use cases
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select Input Format: delimited text (CSV, TSV, PSV, etc.) Encryption: SSE-KMS, SSE-S3 Output Format: delimited text (CSV, TSV, PSV, etc.) Clauses Data types Operators Functions Select String Conditional String From Integer, Float, Decimal Math Cast Where Timestamp Logical Boolean String (Like, ||)
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo: Glacier Select App Glacier Select Amazon Glacier
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select pricing • Data scanned: per GB bill for size of data read • Data returned: per GB charge based on amount of output data returned to S3 • Request cost: charge for each Glacier Select request
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Coming in 2018: Amazon Athena with Amazon Glacier Select Amazon Glacier • Query data in Amazon Glacier directly from Amazon Athena • Enables customers to use Amazon Glacier as a data source for their SQL queries in Amazon Athena
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier Select: GA starts today • Formats: CSV, Any delimiter separated file • Encryption: SSE- KMS, SSE-S3 • Encoding: UTF-8 • Integration: AWS SDK, CLI, Athena integration (expected 2018) • Availability: All commercial regions where Amazon Glacier is launched
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple Faster Cheaper Now: Your data lake on AWS Amazon Glacier Amazon S3 Amazon Redshift Spectrum Amazon Athena Amazon EMR AWS Lambda ISVs and Custom Applications SELECT
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STG303 – Deep Dive on Amazon Glacier – Thurs, 1:45 PM STG312 – Best Practices for Building a Data Lake in Amazon S3 & Amazon Glacier – Thurs, 3:15 PM Learn more…
  39. 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Q&A Amazon S3 Amazon Glacier
  40. 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

×