SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Querying Data in Place with AWS Object
Storage Features and Analytics Tools
Damon Cortesi
Big Data Architect
AWS
S T G 3 7 8 - R
Tim Harris
Principal Engineer
AWS
Mert Hocanin
Big Data Architect
AWS
Ippokratis Pandis
Principal Engineer
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lakes in Amazon Simple Storage Service
(Amazon S3)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on Amazon S3
Search, access, and visualize
Protect and secure
Amazon Kinesis
Data Firehose
Amazon
CloudWatch
AWS CloudTrailAWS IoT
Amazon Redshift
Spectrum
Amazon
Athena
Amazon
EMR
Amazon
S3 Select
Analyze
AWS Glue
ETL and data
catalog
Ingest
Amazon S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EMR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Spectrum
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Select
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a data lake on Amazon S3
Search, access, and visualize
Protect and secure
Amazon Kinesis
Data Firehose
Amazon
CloudWatch
AWS CloudTrailAWS IoT
Amazon Redshift
Spectrum
Amazon
Athena
Amazon
EMR
Amazon
S3 Select
Analyze
AWS Glue
ETL and data
catalog
Ingest
Amazon S3
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Damon Cortesi
dcortesi@amazon.com
Tim Harris
tlh@amazon.com
Mert Hocanin
hocanint@amazon.com
Ippokratis Pandis
ippo@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Frequently asked questions
Amazon S3 Select and Amazon Athena both let me run serverless queries
over data at rest in S3 – What service should I use, and for what use
cases?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Frequently asked questions
How should I name the objects I use in a data lake stored in Amazon
S3? I heard that I ought to include hashes at the start of the names – Is
that correct?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Frequently asked questions
How should I optimize data for querying? What file formats should I use?
How should I split data across multiple files, and what should I aim for in
file size or number of files? Should I use Hive-style partitioning or
buckets?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Frequently asked questions
I read that Amazon EMR lets me run part of my query on Amazon S3
Select. When should I enable that?

More Related Content

What's hot

Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon Web Services
 
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Amazon Web Services
 
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Amazon Web Services
 
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Amazon Web Services
 
Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...
Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...
Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...
Amazon Web Services
 
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Amazon Web Services
 
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Amazon Web Services
 
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Amazon Web Services
 
Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...
Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...
Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...
Amazon Web Services
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Amazon Web Services
 
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Amazon Web Services
 
Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...
Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...
Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...
Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Amazon Web Services
 
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Amazon Web Services
 
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Amazon Web Services
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Amazon Web Services
 
AWS reInvent 2018 recap edition
AWS reInvent 2018 recap editionAWS reInvent 2018 recap edition
AWS reInvent 2018 recap edition
Amazon Web Services
 
Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...
Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...
Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...
Amazon Web Services
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Amazon Web Services
 
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
Amazon Web Services
 

What's hot (20)

Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
Amazon EMR: Optimize Transient Clusters for Data Processing & ETL (ANT341) - ...
 
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
Post-Production Media Delivery at Scale with AWS (STG391) - AWS re:Invent 2018
 
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
 
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
 
Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...
Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...
Got Files? We Got You Covered! Deploy Your File Workloads Quickly & Easily wi...
 
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
Cost Optimization Tooling (ARC301) - AWS re:Invent 2018
 
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
Introduction to GraphQL (MOB316-R1) - AWS re:Invent 2018
 
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
Kickstart Your All-In Move to the AWS Cloud Using AWS Storage Gateway and Ama...
 
Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...
Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...
Best Practices for Running SQL Server on Amazon RDS (DAT323) - AWS re:Invent ...
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
 
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
Reducing Branch Office Storage with AWS Storage Gateway (STG332) - AWS re:Inv...
 
Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...
Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...
Resolving NLP Problems Using Amazon SageMaker Algorithms (GPSCT305) - AWS re:...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
Architecting Digital Media Archive Migrations with AWS (STG357) - AWS re:Inve...
 
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
Build a Searchable Media Library & Moderate Content at Scale Using Machine Le...
 
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
Build a High-Performance, Cloud-Native, Open-Source Platform on AWS & Save Mi...
 
AWS reInvent 2018 recap edition
AWS reInvent 2018 recap editionAWS reInvent 2018 recap edition
AWS reInvent 2018 recap edition
 
Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...
Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...
Bridge the Storage Gap: Hybrid Media Workflows with AWS Storage Gateway (STG3...
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
PreReLa #1 (PRL1-R1) - AWS re:Invent 2018
 

Similar to Querying Data in Place with AWS Object Storage Features and Analytics Tools (STG378-R2) - AWS re:Invent 2018

Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
Amazon Web Services
 
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
Amazon Web Services
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Amazon Web Services
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Amazon Web Services
 
Preparing Data for the Lake: Data Analytics Week SF
Preparing Data for the Lake: Data Analytics Week SFPreparing Data for the Lake: Data Analytics Week SF
Preparing Data for the Lake: Data Analytics Week SF
Amazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Amazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Amazon Web Services
 
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Amazon Web Services
 
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
AWSKRUG - AWS한국사용자모임
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
Amazon Web Services
 
Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018
Amazon Web Services
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
Amazon Web Services
 
Using Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdfUsing Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdf
Amazon Web Services
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
Amazon Web Services
 
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Amazon Web Services
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
Amazon Web Services
 
[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...
[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...
[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...
Amazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Amazon Web Services
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
Amazon Web Services
 

Similar to Querying Data in Place with AWS Object Storage Features and Analytics Tools (STG378-R2) - AWS re:Invent 2018 (20)

Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
 
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select - AWS O...
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Preparing Data for the Lake: Data Analytics Week SF
Preparing Data for the Lake: Data Analytics Week SFPreparing Data for the Lake: Data Analytics Week SF
Preparing Data for the Lake: Data Analytics Week SF
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
 
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
Lake Formation, 데이터레이크 관리와 운영을 하나로 :: 이재성 - AWS Community Day 2019
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Using Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdfUsing Big Data Retail to Build a Single View of Your Customer.pdf
Using Big Data Retail to Build a Single View of Your Customer.pdf
 
Preparing Data for the Lake
Preparing Data for the LakePreparing Data for the Lake
Preparing Data for the Lake
 
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
Voice-Powered Serverless Analytics (SRV240-R1) - AWS re:Invent 2018
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...
[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...
[NEW LAUNCH!] [REPEAT 1] Amazon FSx for Lustre: How to build and deploy file ...
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Querying Data in Place with AWS Object Storage Features and Analytics Tools (STG378-R2) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Querying Data in Place with AWS Object Storage Features and Analytics Tools Damon Cortesi Big Data Architect AWS S T G 3 7 8 - R Tim Harris Principal Engineer AWS Mert Hocanin Big Data Architect AWS Ippokratis Pandis Principal Engineer AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data lakes in Amazon Simple Storage Service (Amazon S3)
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a data lake on Amazon S3 Search, access, and visualize Protect and secure Amazon Kinesis Data Firehose Amazon CloudWatch AWS CloudTrailAWS IoT Amazon Redshift Spectrum Amazon Athena Amazon EMR Amazon S3 Select Analyze AWS Glue ETL and data catalog Ingest Amazon S3
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EMR
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Redshift Spectrum
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Select
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a data lake on Amazon S3 Search, access, and visualize Protect and secure Amazon Kinesis Data Firehose Amazon CloudWatch AWS CloudTrailAWS IoT Amazon Redshift Spectrum Amazon Athena Amazon EMR Amazon S3 Select Analyze AWS Glue ETL and data catalog Ingest Amazon S3
  • 11. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Damon Cortesi dcortesi@amazon.com Tim Harris tlh@amazon.com Mert Hocanin hocanint@amazon.com Ippokratis Pandis ippo@amazon.com
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frequently asked questions Amazon S3 Select and Amazon Athena both let me run serverless queries over data at rest in S3 – What service should I use, and for what use cases?
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frequently asked questions How should I name the objects I use in a data lake stored in Amazon S3? I heard that I ought to include hashes at the start of the names – Is that correct?
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frequently asked questions How should I optimize data for querying? What file formats should I use? How should I split data across multiple files, and what should I aim for in file size or number of files? Should I use Hive-style partitioning or buckets?
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frequently asked questions I read that Amazon EMR lets me run part of my query on Amazon S3 Select. When should I enable that?