SlideShare a Scribd company logo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Snively, Specialist Solutions Architect – Data and Analytics
October 12, 2017
Serverless Analytics with
Amazon Redshift Spectrum, AWS
Glue, and Amazon QuickSight
Agenda
• What is Serverless?
• Enterprise Data Warehouse on AWS (Amazon Redshift)
• Serverless Queries from your Data Warehouse (Redshift Spectrum)
• Serverless Data Catalog (AWS Glue)
• Serverless ETL (AWS Glue)
• Serverless BI (Amazon QuickSight)
• Demonstration
• Wrap up
What is Serverless
Virtualized Managed Serverless
You can easily
provision servers and
focus on OS and
above.
You focus higher in the
stack but still need to
consider servers, how
much CPU is needed, and
how much RAM.
AWS manages based the
customer configuration.
Build applications and services
without thinking of servers.
Don’t be concerned about
provisioning, scaling, and
maintaining servers for fault
tolerance and availability.
AWS does all of this for you.
No servers to
provision or manage
Scales with usage
Never pay for
idle resources
Availability and fault
tolerance built in
Serverless characteristics
• Managed Massively Parallel Petabyte
Scale Data Warehouse
• Streaming Backup/Restore to S3
• Load data from S3, DynamoDB and EMR
• Extensive Security Features
• Online Scaling from 160 GB -> 2 PB
Amazon
Redshift
Enterprise Data Warehouse a lot faster
a lot simpler
a lot cheaper
Selected Amazon Redshift customers
We innovate quickly
Well over 140 new features added since launch
Release every two weeks
Automatic patching
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
DUB (4/25)
SOC1/2/3 (5/8)
Unload Encrypted Files
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
SHA1 Builtin (7/15)
4 byte UTF-8 (7/18)
Sharing snapshots (7/18)
Statement Timeout (7/22)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress
(8/9)
Resource Level IAM (8/9)
PCI (8/22)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
Kinesis EMR/HDFS/SSH copy,
Distributed Tables, Audit
Logging/CloudTrail, Concurrency, Resize
Perf., Approximate Count Distinct, SNS
Alerts, Cross Region Backup (11/13)
Distributed Tables, Single Node Cursor
Support, Maximum Connections to 500
(12/13)
EIP Support for VPC Clusters (12/28)
New query monitoring system tables and
diststyle all (1/13)
Redshift on DW2 (SSD) Nodes (1/23)
Compression for COPY from SSH, Fetch
size support for single node clusters, new
system tables with commit stats,
row_number(), strotol() and query
termination (2/13)
Resize progress indicator & Cluster
Version (3/21)
Regex_Substr, COPY from JSON (3/25)
50 slots, COPY from EMR, ECDHE
ciphers (4/22)
3 new regex features, Unload to single
file, FedRAMP(5/6)
Rename Cluster (6/2)
Copy from multiple regions,
percentile_cont, percentile_disc (6/30)
Free Trial (7/1)
pg_last_unload_count (9/15)
AES-128 S3 encryption (9/29)
UTF-16 support (9/29)
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency: Multiple
clusters access same data
No ETL: Query data in-place
using open file formats
Full Amazon Redshift
SQL support
S3
SQL
Redshift Spectrum
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
AWS Glue
Data Catalog
Apache Hive Metastore
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Leverages Amazon Redshift’s advanced cost-
based optimizer
Pushes down projections, filters, aggregations
and join reduction
Dynamic partition pruning to minimize data
processed
Automatic parallelization of query execution
against S3 data
Efficient join processing within the Amazon
Redshift cluster
Spectrum
Nodes
Redshift
Nodes
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Glue Data
Catalog
Apache Hive Metastore
1
Query is optimized and compiled at
the leader node. Determine what gets
run locally and what goes to Amazon
Redshift Spectrum
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Glue Data
Catalog
Apache Hive Metastore
2
Query plan is sent to
all compute nodes
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Glue Data
Catalog
Apache Hive Metastore
3
Compute nodes obtain partition info from
Data Catalog; dynamically prune partitions
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Glue Data
Catalog
Apache Hive Metastore
4
Each compute node issues multiple
requests to the Amazon Redshift
Spectrum layer
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
5
Glue Data
Catalog
Apache Hive Metastore
Amazon Redshift Spectrum nodes
scan your S3 data
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
6
Glue Data
Catalog
Apache Hive Metastore
7
Amazon Redshift
Spectrum projects,
filters, and aggregates
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
Glue Data
Catalog
Apache Hive Metastore
Final aggregations and joins
with local Amazon Redshift
tables done in-cluster
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
8
Glue Data
Catalog
Apache Hive Metastore
Result is sent back to client
Life of a query
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon S3
Exabyte-scale object storage
9
Glue Data
Catalog
Apache Hive Metastore
Glue’ing it together
AWS Glue
Automatically discovers and categorizes your data to make it
immediately searchable and queryable
Generates code to clean, enrich, and reliably move data between data
stores; you can also use their favorite tools to build ETL jobs
Runs your jobs on a serverless, fully managed, scale-out environment
without needing to provision or manage compute resources
Discover
Develop
Deploy
AWS Glue: Components
Data Catalog
 Apache Hive Metastore compatible with enhanced functionality
 Crawlers automatically extract metadata and create tables
 Integrated with Amazon Athena, Amazon Redshift Spectrum
Job Execution
 Runs jobs on a serverless Spark platform
 Provides flexible scheduling
 Handles dependency resolution, monitoring, and alerting
Job Authoring
 Auto-generates ETL code
 Built on open frameworks – Python and Spark
 Developer-centric – editing, debugging, sharing
AWS Glue Data Catalog
Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.)
into a single categorized list that is searchable
Crawlers: Classifiers
IAM Role
Glue Crawler
Data Lakes
Data Warehouse
Databases
Amazon
RDS
Amazon
Redshift
Amazon S3
JDBC Connection
Object Connection
Built-In Classifiers
MySQL
MariaDB
PostreSQL
Aurora
SQL Server / Oracle
Redshift
Avro
Parquet
ORC
JSON & BJSON
Logs
(Apache, Linux, MS, Ruby, Redis, and many others)
Delimited
(comma, pipe, tab, semicolon)
Compressed Formats
(ZIP, BZIP, GZIP, LZ4, Snappy)
Create additional Custom
Classifiers with Grok!
Building your Data Catalog
Job authoring in AWS Glue
 Python code generated by AWS Glue
 Connect a notebook or IDE to AWS Glue
 Existing code brought into AWS Glue
You have choices on
how to get started
1. Customize the mappings
2. Glue generates transformation graph and Python code
3. Connect your notebook to development endpoints to customize your code
Job authoring: Automatic code generation
Job authoring: Relationalize() transform
Semi-structured schema Relational schema
• Transforms and adds new columns, types, and tables on-the-fly
• Tracks keys and foreign keys across runs
• SQL on the relational schema is orders of magnitude faster than JSON processing
F
K
A B B C.X C.
Y
P
K
Valu
e
Offs
et
A C D [ ]
X Y
B B
Serverless ETL to populate your warehouse
Amazon QuickSight is a Business Analytics Service that lets business users
quickly and easily visualize, explore, and share insights from their data.
Basic Concepts
Retail Data
Ops Data
Marketing Data
Relational
Databases
Flat Files
More data sources
coming soon!
Microsoft Active
DirectoryLocal User Definition
QuickSight is deeply integrated
with AWS data sources like
Redshift, RDS, S3, Athena and
others, as well as third-party
sources like Excel, Salesforce, as
well as on-premises databases.
Deep Integration with
AWS Data Sources
Amazon RDS,
Aurora
Amazon
Redshift
Amazon
Athena
Amazon S3
Flat Files
Super-fast Performance with SPICE
Putting the pieces together
Demonstration
What did we cover…
Automatically discovers and
categorizes your data to
make it immediately
searchable and queryable
Business Analytics Service that
lets business users quickly and
easily visualize, explore, and
share insights from their data.
extend the analytic power
beyond data stored your
data warehouse to query
vast amounts of
unstructured data in your
Amazon S3 “data lake”
Runs your jobs on a serverless, fully
managed, scale-out environment
without needing to provision or
manage compute resources
Thank you

More Related Content

What's hot

Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Amazon Web Services
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
AWS Marketplace
AWS MarketplaceAWS Marketplace
AWS Marketplace
Amazon Web Services
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
Amazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
Amazon Web Services
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost Management
Amazon Web Services
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
Amazon Web Services
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
Amazon API Gateway and AWS Lambda: Better Together
Amazon API Gateway and AWS Lambda: Better TogetherAmazon API Gateway and AWS Lambda: Better Together
Amazon API Gateway and AWS Lambda: Better Together
Danilo Poccia
 
Building Secure Architectures on AWS
Building Secure Architectures on AWSBuilding Secure Architectures on AWS
Building Secure Architectures on AWS
Amazon Web Services
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
Chris Taylor
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
Dimko Zhluktenko
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
Amazon Web Services
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 

What's hot (20)

Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
AWS Marketplace
AWS MarketplaceAWS Marketplace
AWS Marketplace
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost Management
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Amazon API Gateway and AWS Lambda: Better Together
Amazon API Gateway and AWS Lambda: Better TogetherAmazon API Gateway and AWS Lambda: Better Together
Amazon API Gateway and AWS Lambda: Better Together
 
Building Secure Architectures on AWS
Building Secure Architectures on AWSBuilding Secure Architectures on AWS
Building Secure Architectures on AWS
 
AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!AWS Glue - let's get stuck in!
AWS Glue - let's get stuck in!
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 

Similar to Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon QuickSight - AWS Online Tech Talks

Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
Amazon Web Services
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
Pratim Das
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
Shu-Jeng Hsieh
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Amazon Web Services
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Amazon Web Services
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Amazon Web Services
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
Amazon Web Services
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
Amazon Web Services
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
Torsten Steinbach
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Precisely
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
Amazon Web Services
 

Similar to Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon QuickSight - AWS Online Tech Talks (20)

Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
Migrating Your Data Warehouse to Amazon Redshift (DAT337) - AWS re:Invent 2018
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon QuickSight - AWS Online Tech Talks

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ben Snively, Specialist Solutions Architect – Data and Analytics October 12, 2017 Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon QuickSight
  • 2. Agenda • What is Serverless? • Enterprise Data Warehouse on AWS (Amazon Redshift) • Serverless Queries from your Data Warehouse (Redshift Spectrum) • Serverless Data Catalog (AWS Glue) • Serverless ETL (AWS Glue) • Serverless BI (Amazon QuickSight) • Demonstration • Wrap up
  • 3. What is Serverless Virtualized Managed Serverless You can easily provision servers and focus on OS and above. You focus higher in the stack but still need to consider servers, how much CPU is needed, and how much RAM. AWS manages based the customer configuration. Build applications and services without thinking of servers. Don’t be concerned about provisioning, scaling, and maintaining servers for fault tolerance and availability. AWS does all of this for you.
  • 4. No servers to provision or manage Scales with usage Never pay for idle resources Availability and fault tolerance built in Serverless characteristics
  • 5. • Managed Massively Parallel Petabyte Scale Data Warehouse • Streaming Backup/Restore to S3 • Load data from S3, DynamoDB and EMR • Extensive Security Features • Online Scaling from 160 GB -> 2 PB Amazon Redshift Enterprise Data Warehouse a lot faster a lot simpler a lot cheaper
  • 7. We innovate quickly Well over 140 new features added since launch Release every two weeks Automatic patching Service Launch (2/14) PDX (4/2) Temp Credentials (4/11) DUB (4/25) SOC1/2/3 (5/8) Unload Encrypted Files NRT (6/5) JDBC Fetch Size (6/27) Unload logs (7/5) SHA1 Builtin (7/15) 4 byte UTF-8 (7/18) Sharing snapshots (7/18) Statement Timeout (7/22) Timezone, Epoch, Autoformat (7/25) WLM Timeout/Wildcards (8/1) CRC32 Builtin, CSV, Restore Progress (8/9) Resource Level IAM (8/9) PCI (8/22) UTF-8 Substitution (8/29) JSON, Regex, Cursors (9/10) Split_part, Audit tables (10/3) SIN/SYD (10/8) HSM Support (11/11) Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count Distinct, SNS Alerts, Cross Region Backup (11/13) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 (12/13) EIP Support for VPC Clusters (12/28) New query monitoring system tables and diststyle all (1/13) Redshift on DW2 (SSD) Nodes (1/23) Compression for COPY from SSH, Fetch size support for single node clusters, new system tables with commit stats, row_number(), strotol() and query termination (2/13) Resize progress indicator & Cluster Version (3/21) Regex_Substr, COPY from JSON (3/25) 50 slots, COPY from EMR, ECDHE ciphers (4/22) 3 new regex features, Unload to single file, FedRAMP(5/6) Rename Cluster (6/2) Copy from multiple regions, percentile_cont, percentile_disc (6/30) Free Trial (7/1) pg_last_unload_count (9/15) AES-128 S3 encryption (9/29) UTF-16 support (9/29)
  • 8. Amazon Redshift Spectrum Run SQL queries directly against data in S3 using thousands of nodes Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query High concurrency: Multiple clusters access same data No ETL: Query data in-place using open file formats Full Amazon Redshift SQL support S3 SQL
  • 9. Redshift Spectrum ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage AWS Glue Data Catalog Apache Hive Metastore 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC Leverages Amazon Redshift’s advanced cost- based optimizer Pushes down projections, filters, aggregations and join reduction Dynamic partition pruning to minimize data processed Automatic parallelization of query execution against S3 data Efficient join processing within the Amazon Redshift cluster Spectrum Nodes Redshift Nodes
  • 10. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Glue Data Catalog Apache Hive Metastore 1
  • 11. Query is optimized and compiled at the leader node. Determine what gets run locally and what goes to Amazon Redshift Spectrum Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Glue Data Catalog Apache Hive Metastore 2
  • 12. Query plan is sent to all compute nodes Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Glue Data Catalog Apache Hive Metastore 3
  • 13. Compute nodes obtain partition info from Data Catalog; dynamically prune partitions Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Glue Data Catalog Apache Hive Metastore 4
  • 14. Each compute node issues multiple requests to the Amazon Redshift Spectrum layer Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage 5 Glue Data Catalog Apache Hive Metastore
  • 15. Amazon Redshift Spectrum nodes scan your S3 data Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage 6 Glue Data Catalog Apache Hive Metastore
  • 16. 7 Amazon Redshift Spectrum projects, filters, and aggregates Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage Glue Data Catalog Apache Hive Metastore
  • 17. Final aggregations and joins with local Amazon Redshift tables done in-cluster Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage 8 Glue Data Catalog Apache Hive Metastore
  • 18. Result is sent back to client Life of a query Amazon Redshift JDBC/ODBC ... 1 2 3 4 N Amazon S3 Exabyte-scale object storage 9 Glue Data Catalog Apache Hive Metastore
  • 20. AWS Glue Automatically discovers and categorizes your data to make it immediately searchable and queryable Generates code to clean, enrich, and reliably move data between data stores; you can also use their favorite tools to build ETL jobs Runs your jobs on a serverless, fully managed, scale-out environment without needing to provision or manage compute resources Discover Develop Deploy
  • 21. AWS Glue: Components Data Catalog  Apache Hive Metastore compatible with enhanced functionality  Crawlers automatically extract metadata and create tables  Integrated with Amazon Athena, Amazon Redshift Spectrum Job Execution  Runs jobs on a serverless Spark platform  Provides flexible scheduling  Handles dependency resolution, monitoring, and alerting Job Authoring  Auto-generates ETL code  Built on open frameworks – Python and Spark  Developer-centric – editing, debugging, sharing
  • 22. AWS Glue Data Catalog Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single categorized list that is searchable
  • 23. Crawlers: Classifiers IAM Role Glue Crawler Data Lakes Data Warehouse Databases Amazon RDS Amazon Redshift Amazon S3 JDBC Connection Object Connection Built-In Classifiers MySQL MariaDB PostreSQL Aurora SQL Server / Oracle Redshift Avro Parquet ORC JSON & BJSON Logs (Apache, Linux, MS, Ruby, Redis, and many others) Delimited (comma, pipe, tab, semicolon) Compressed Formats (ZIP, BZIP, GZIP, LZ4, Snappy) Create additional Custom Classifiers with Grok!
  • 25. Job authoring in AWS Glue  Python code generated by AWS Glue  Connect a notebook or IDE to AWS Glue  Existing code brought into AWS Glue You have choices on how to get started
  • 26. 1. Customize the mappings 2. Glue generates transformation graph and Python code 3. Connect your notebook to development endpoints to customize your code Job authoring: Automatic code generation
  • 27. Job authoring: Relationalize() transform Semi-structured schema Relational schema • Transforms and adds new columns, types, and tables on-the-fly • Tracks keys and foreign keys across runs • SQL on the relational schema is orders of magnitude faster than JSON processing F K A B B C.X C. Y P K Valu e Offs et A C D [ ] X Y B B
  • 28. Serverless ETL to populate your warehouse
  • 29. Amazon QuickSight is a Business Analytics Service that lets business users quickly and easily visualize, explore, and share insights from their data.
  • 30. Basic Concepts Retail Data Ops Data Marketing Data Relational Databases Flat Files More data sources coming soon! Microsoft Active DirectoryLocal User Definition
  • 31. QuickSight is deeply integrated with AWS data sources like Redshift, RDS, S3, Athena and others, as well as third-party sources like Excel, Salesforce, as well as on-premises databases. Deep Integration with AWS Data Sources Amazon RDS, Aurora Amazon Redshift Amazon Athena Amazon S3 Flat Files
  • 33. Putting the pieces together
  • 35. What did we cover… Automatically discovers and categorizes your data to make it immediately searchable and queryable Business Analytics Service that lets business users quickly and easily visualize, explore, and share insights from their data. extend the analytic power beyond data stored your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” Runs your jobs on a serverless, fully managed, scale-out environment without needing to provision or manage compute resources