Denver AWS Users' Group meeting - September 2017

•

1 like•191 views

David McDaniel

Great presentation on Athena, QuickSight, S3 and related technologies by AWS Solution Architect Ben Snively.

Technology

Amazon Athena
Interactive query service to analyze data in Amazon S3 using standard SQL
© 2015 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified,
or distributed in whole or in part without the express consent of Amazon Web Services, Inc.

Cloud Architecture Evolution
Virtualized Managed Serverless
Virtualized
Servers
Managed
Platforms
Serverless
Analytics

No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless characteristics

Amazon Athena is an interactive query service
that makes it easy to analyze data directly from
Amazon S3 using Standard SQL

Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades

Amazon Athena is Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use the console Add Table wizard
• Use Glue Data Catalog
• Start querying

Amazon Athena is Highly Available
• You connect to a service endpoint or log into the console
• Athena uses warm compute pools across multiple
Availability Zones
• Your data is in Amazon S3, which is also highly available
and designed for 99.999999999% durability

Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Text, CSV, JSON, weblogs, AWS service logs
• Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required
• Stream data from directly from Amazon S3
• Take advantage of Amazon S3 durability and availability

Use ANSI SQL
• Start writing ANSI SQL
• Support for complex joins, nested
queries & window functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data by any
key
• (date, time, custom keys)
• e.g., Year, Month, Day, Hour or
Customer Key, Date

Simple Pricing
• DDL operations – FREE
• SQL operations – FREE
• Query concurrency – FREE
• Data scanned - $5 / TB
• Standard S3 rates for storage, requests, and data transfer apply

Simple Pricing - $5/TB Scanned
• Pay by the amount of data scanned per query
• Ways to save costs
• Compress
• Convert to Columnar format
• Use partitioning
• Free: DDL Queries, Failed Queries
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in
Apache Parquet
format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data
scanned
99.7% cheaper

AWS Glue: Components
Data Catalog
§ Apache Hive Metastore compatible with enhanced functionality
§ Crawlers automatically extract metadata and create tables
§ Integrated with Amazon Athena, Amazon Redshift Spectrum
Job Execution
§ Runs jobs on a serverless Spark platform
§ Provides flexible scheduling
§ Handles dependency resolution, monitoring, and alerting
Job Authoring
§ Auto-generates ETL code
§ Built on open frameworks – Python and Spark
§ Developer-centric – editing, debugging, sharing

Diving into a demonstration…
Store various formats of raw data on Amazon S3
Investigate Data in the raw form via using a familiar query language
Transform Data into a canonical, easily queried format
Analyze across datasets
Build Dashboards to drive Insights
Discover and Organize your data automatically

Yellow Taxi Rides
Athena
Amazon
S3
1 months CSV saved in S3
Query it in CSV format…

Yellow Rides Green Rides
Glue
ETL Jobs
Canonical
Partitioned
Data
Athena
FHV Rides
3 Data Sources and 8 Years of Data

Yellow Rides Green Rides
Glue
ETL Jobs
Canonical
Partitioned
Data
QuickSight
Athena
FHV Rides
3 Data Sources and 8 Years of Data

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Questions ?
Thank you!

What's hot

Co 4, session 2, aws analytics servicesm vaishnavi

AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services

AWS Kinesis - Streams, Firehose, AnalyticsSerhat Can

Streaming Data Analytics with Amazon Redshift and Kinesis FirehoseAmazon Web Services

Introduction to AWS GlueAmazon Web Services

Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Amazon Web Services

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services

AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services

Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...Amazon Web Services

Big data and serverless - AWS UG The NetherlandsMarek Kuczynski

Supercharging the Value of Your Data with Amazon S3Amazon Web Services

Real-Time Streaming Data Solution on AWS with BeeswaxAmazon Web Services

Deep Dive on Log Analytics with Elasticsearch ServiceAmazon Web Services

Bleeding Edge DatabasesLynn Langit

Azuresatpn19 - An Introduction To Azure Data FactoryRiccardo Perico

Building a data warehouse with AWS Redshift, Matillion and YellowfinLynn Langit

Best Practices for Building a Data Lake on AWSAmazon Web Services

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services

(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...Amazon Web Services

SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services

What's hot (20)

Co 4, session 2, aws analytics services

AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)

AWS Kinesis - Streams, Firehose, Analytics

Streaming Data Analytics with Amazon Redshift and Kinesis Firehose

Introduction to AWS Glue

Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)

AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud

Build a Real-time Streaming Data Visualization System with Amazon Kinesis Ana...

Big data and serverless - AWS UG The Netherlands

Supercharging the Value of Your Data with Amazon S3

Real-Time Streaming Data Solution on AWS with Beeswax

Deep Dive on Log Analytics with Elasticsearch Service

Bleeding Edge Databases

Azuresatpn19 - An Introduction To Azure Data Factory

Building a data warehouse with AWS Redshift, Matillion and Yellowfin

Best Practices for Building a Data Lake on AWS

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...

(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...

SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis

Similar to Denver AWS Users' Group meeting - September 2017

使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services

NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services

Los Angeles AWS Users Group - Athena Deep DiveKevin Epstein

Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...Amazon Web Services

Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...Amazon Web Services

Serverless Big Data Analytics with Amazon Athena and QuickSightAmazon Web Services

NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services

BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services

Serverlesss Big Data Analytics with Amazon Athena and QuicksightAmazon Web Services

AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea

Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Amazon Web Services

(BDT317) Building A Data Lake On AWSAmazon Web Services

What is Amazon Athenajeetendra mandal

Querying and Analyzing Data in Amazon S3Amazon Web Services

Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Amazon Web Services

What's New with Big Data AnalyticsAmazon Web Services

AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services

The Best of re:invent 2016Amazon Web Services

2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 Amazon Web Services Korea

Similar to Denver AWS Users' Group meeting - September 2017 (20)

使用 Amazon Athena 直接分析儲存於 S3 的巨量資料

NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL

Los Angeles AWS Users Group - Athena Deep Dive

Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...

Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...

Serverless Big Data Analytics with Amazon Athena and QuickSight

NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.

BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...

Serverlesss Big Data Analytics with Amazon Athena and Quicksight

AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)

Interactive Analytics on AWS - AWS Summit Tel Aviv 2017

(BDT317) Building A Data Lake On AWS

What is Amazon Athena

Querying and Analyzing Data in Amazon S3

Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...

What's New with Big Data Analytics

AWS March 2016 Webinar Series Building Your Data Lake on AWS

The Best of re:invent 2016

2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개

Recently uploaded

Elevate Developer Efficiency & build GenAI Application with Amazon QBhuvaneswari Subramani

Navigating Identity and Access Management in the Modern EnterpriseWSO2

The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software

Introduction to use of FHIR Documents in ABDMKumar Satyam

Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Platformless Horizons for Digital AdaptabilityWSO2

MINDCTI Revenue Release Quarter One 2024MIND CTI

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

DBX First Quarter 2024 Investor PresentationDropbox

How to Check CNIC Information Online with Pakdata cfdanishmna97

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash

JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Modernizing Legacy Systems Using BallerinaWSO2

Corporate and higher education May webinar.pptxRustici Software

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2

Quantum Leap in Next-Generation ComputingWSO2

Recently uploaded (20)

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Navigating Identity and Access Management in the Modern Enterprise

The Zero-ETL Approach: Enhancing Data Agility and Insight

Introduction to use of FHIR Documents in ABDM

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Platformless Horizons for Digital Adaptability

MINDCTI Revenue Release Quarter One 2024

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

DBX First Quarter 2024 Investor Presentation

How to Check CNIC Information Online with Pakdata cf

AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)

JavaScript Usage Statistics 2024 - The Ultimate Guide

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Modernizing Legacy Systems Using Ballerina

Corporate and higher education May webinar.pptx

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

Quantum Leap in Next-Generation Computing

Denver AWS Users' Group meeting - September 2017

1. Amazon Athena Interactive query service to analyze data in Amazon S3 using standard SQL © 2015 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services, Inc.

2. Serverless

3. Cloud Architecture Evolution Virtualized Managed Serverless Virtualized Servers Managed Platforms Serverless Analytics

4. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless characteristics

5. Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL

6. Athena is Serverless • No Infrastructure or administration • Zero Spin up time • Transparent upgrades

7. Amazon Athena is Easy To Use • Log into the Console • Create a table • Type in a Hive DDL Statement • Use the console Add Table wizard • Use Glue Data Catalog • Start querying

8. Amazon Athena is Highly Available • You connect to a service endpoint or log into the console • Athena uses warm compute pools across multiple Availability Zones • Your data is in Amazon S3, which is also highly available and designed for 99.999999999% durability

9. Query Data Directly from Amazon S3 • No loading of data • Query data in its raw format • Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No ETL required • Stream data from directly from Amazon S3 • Take advantage of Amazon S3 durability and availability

10. Use ANSI SQL • Start writing ANSI SQL • Support for complex joins, nested queries & window functions • Support for complex data types (arrays, structs) • Support for partitioning of data by any key • (date, time, custom keys) • e.g., Year, Month, Day, Hour or Customer Key, Date

11. Customers Drive Product Decisions

12. Simple Pricing • DDL operations – FREE • SQL operations – FREE • Query concurrency – FREE • Data scanned - $5 / TB • Standard S3 rates for storage, requests, and data transfer apply

13. Simple Pricing - $5/TB Scanned • Pay by the amount of data scanned per query • Ways to save costs • Compress • Convert to Columnar format • Use partitioning • Free: DDL Queries, Failed Queries Dataset Size on Amazon S3 Query Run time Data Scanned Cost Logs stored as Text files 1 TB 237 seconds 1.15TB $5.75 Logs stored in Apache Parquet format* 130 GB 5.13 seconds 2.69 GB $0.013 Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper

14. AWS Glue: Components Data Catalog § Apache Hive Metastore compatible with enhanced functionality § Crawlers automatically extract metadata and create tables § Integrated with Amazon Athena, Amazon Redshift Spectrum Job Execution § Runs jobs on a serverless Spark platform § Provides flexible scheduling § Handles dependency resolution, monitoring, and alerting Job Authoring § Auto-generates ETL code § Built on open frameworks – Python and Spark § Developer-centric – editing, debugging, sharing

15. Analyzing and Processing all your data

16. Diving into a demonstration… Store various formats of raw data on Amazon S3 Investigate Data in the raw form via using a familiar query language Transform Data into a canonical, easily queried format Analyze across datasets Build Dashboards to drive Insights Discover and Organize your data automatically

17. Yellow Taxi Rides Athena Amazon S3 1 months CSV saved in S3 Query it in CSV format…

18. Yellow Rides Green Rides Glue ETL Jobs Canonical Partitioned Data Athena FHV Rides 3 Data Sources and 8 Years of Data

19. Yellow Rides Green Rides Glue ETL Jobs Canonical Partitioned Data QuickSight Athena FHV Rides 3 Data Sources and 8 Years of Data

Denver AWS Users' Group meeting - September 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Denver AWS Users' Group meeting - September 2017

Similar to Denver AWS Users' Group meeting - September 2017 (20)

More from David McDaniel

More from David McDaniel (15)

Recently uploaded

Recently uploaded (20)

Denver AWS Users' Group meeting - September 2017