Leveraging Cloud Analytics to Support Data-Driven Decisions

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ben Snively, Solutions Architect – Data and Analytics, AI/ML
Wednesday, May 22, 2019
Data Lifecycle –
Driving Insights with Analytics and Machine
Learning

Driving Insights:
Deliver decisions makers the
insights to transform an
organization by identifying
unmet needs within the
customers or by optimizing
operational processes
Questions to ask:
What business question is being answered?
Does the data support answering them?
Who are the users driving the insights?
What skills do those users have?

Business needs come in various forms:
• Present actionable information and reporting to executives and
managers
• Combined heterogeneous datasets together to be able to
answer additional questions
• Query and Investigate your datasets
• Drive operational and security understanding.
• Understand what’s happening in the business now

Analytics solutions
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Analytics

Data Warehousing
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Present actionable information and reporting to
executives and managers
Analytics

Amazon Redshift – data warehousing
Fast, powerful, simple, and fully managed data warehouse at 1/10 the cost
Massively parallel, scale from gigabytes to petabytes
Fast at scale
Columnar storage
technology to improve I/O
efficiency and scale query
performance
$
Inexpensive
As low as $1,000 per
terabyte per year, 1/10 the
cost of traditional data
warehouse solutions; start
at $0.25 per hour
Open file formats Secure
Audit everything; encrypt
data end-to-end;
extensive certification and
compliance
Analyze optimized data
formats on the latest SSD,
and all open data formats in
Amazon S3
Analytics

Amazon Redshift Data Warehouse
Relational data
Gigabytes to petabytes scale
Reporting and analysis
Schema defined prior to data load
AWS
Glue ETL
On Prem
Amazon
QuickSight
Existing or new
BI tool
Redshift
COPY
Analytics

Complementary to EDW (not replacement) Data lake can be source for EDW
Schema on read (no predefined schemas) Schema on write (predefined schemas)
Structured/semi-structured/Unstructured data Structured data only
Fast ingestion of new data/content Time consuming to introduce new content
Data Science + Prediction/Advanced Analytics + BI use
cases
BI use cases
Data at low level of detail/granularity Data at summary/aggregated level of detail
Loosely defined SLAs Tight SLAs (production schedules)
Flexibility in tools (open source/tools for advanced
analytics)
Limited flexibility in tools (SQL only)
Elastic storage and compute capacity – decoupled
Explicitly sized environments, compute and storage
scaled in linearly
A Data Lake is not an Enterprise Data Warehouse
Data Lake EDW
Analytics

Amazon Redshift Spectrum
E x t e n d t h e d a t a w a r e h o u s e t o e x a b y t e s o f d a t a i n A m a z o n S 3 d a t a l a k e
Amazon S3
Data Lake
Amazon
Redshift data
Amazon Redshift Spectrum
query engine
Exabyte Redshift SQL queries against Amazon S3
Join data across Redshift and Amazon S3
Scale compute and storage separately
Stable query performance and unlimited concurrency
CSV, ORC, Grok, Avro, & Parquet data formats
Pay only for the amount of data scanned
Analytics

Amaz on Reds hift
Spec tr um
Quer y your D ata Lak e
Amazon
Redshift
JDBC/ODBC
...
1 2 3 4 N
Amazon Redshift
Spectrum
Scale-out serverless compute
AWS Glue Data Catalog
COPY
commands
Hot data
Query directly
on Data Lake
Analytics

Data Lakes extend the
traditional data warehouse
Data warehouse
Business intelligence
OLTP ERP CRM LOB
• Relational and nonrelational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Data lake
Big data processing,
real-time, machine learning
Analytics

Visual insights for everyone
with Amazon QuickSight
Pay only for what you use
Scale to tens of thousands of users
Embedded analytics
Build end-to-end BI solutions
Visualization

Big Data Processing
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Combined heterogeneous datasets together to be able to
answer additional questions
Analytics

Amazon EMR – big data processing
Analytics and ML at scale
Run other popular distributed frameworks such as Apache Spark, HBase, Presto, and
Flink, and many others
Enterprise-grade security
$
Latest versions
Updated with the latest
open source frameworks
within 30 days of release
Low cost
Flexible billing with per-
second billing, Amazon
EC2 Spot, Reserved
Instances, and Auto
Scaling to reduce costs
50%-80%
Amazon S3 storage
Process data directly in
the Amazon S3 data lake
securely with high
performance using the
EMRFS connector
Easy
Launch fully managed
Hadoop & Spark in minutes;
no cluster setup, node
provisioning, cluster tuning
Data Lake
100110000100101011100
1010101110010101000
00111100101100101
010001100001
Analytics

Hadoop / Spark Analytics on AWS
YARN (Hadoop Resource Manager)
NoSQLMachine
learning
Real-timeInteractiveScriptBatch
Data Lake
on AWS
Amazon S3
Amazon EMR
Managed Hadoop / Spark
Object storage
Analytics

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fitting this into the Common Data Catalog
Amazon S3
Interactive Spark cluster
Amazon EMR
Amazon EMR
EMRFS
HDFS
Transient ETL job
Source of Truth
EMRFS
HDFS
Describes the data
MySQL DB
instance
Unifieddataview
AWS Glue
Data Catalog
Stores the data
…
Analytics

Interactive Query
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Query and Investigate your datasets Analytics

Amazon Athena – interactive analysis
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
$ SQL
Query instantly
Zero setup cost; just
point to Amazon S3
and start querying
Pay per query
Pay only for queries run;
save 30%–90% on per-
query costs through
compression
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight
Analytics

Operational Analytics
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Present actionable information and reporting to
executives and managers
Analytics

Operational analytics for logs and search
with Amazon Elasticsearch Service
Fully managed; deploy
production-ready cluster
in minutes
Direct access to Elasticsearch
open-source APIs, Logstash
and Kibana
Amazon VPC support; at-rest
and in-transit encryption
Easily scale up and down
Analytics

Real time Analytics
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Understand what’s happening in the business now

Demonstration

Machine Learning solutions
AI Services ML Services
ML Frameworks and
Infrastructure

M L F R A M E W O R K S &
I N F R A S T R U C T U R E
A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Language Chatbots
A M A Z O N
S A G E M A K E R
B U I L D T R A I N
F O R E C A S T
Forecasting
T E X T R A C T
Recommendations
D E P L O Y
Pre-built algorithms
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization (N E O )
M L S E R V I C E S
F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e
E C 2 P 3
& P 3 d n
E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C
I N F E R E N C E
Reinforcement learning
Algorithms & models ( A W S M A R K E T P L A C E
F O R M A C H I N E L E A R N I N G )
I N F E R E N T I A
Notebook Hosting
One-click deployment & hosting
Auto-scaling
Virtual Private Cloud
Private Link
Elastic Inference integration
Hyper Parameter Optimization
P E R S O N A L I Z E

A I S E R V I C E S
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D &
C O M P R E H E N D
M E D I C A L
L E XR E K O G N I T I O N
V I D E O
Vision Speech Language Chatbots
F O R E C A S T
Forecasting
T E X T R A C T
Recommendations
P E R S O N A L I Z E

One-click
model training
& deployment
10x
better algorithm
performance
Predictive insights
to improve
decision making
A M A Z O N
S A G E M A K E R
B U I L D T R A I N D E P L O Y
Pre-built algorithms
Data labeling (G R O U N D T R U T H )
One-click model training & tuning
Optimization (N E O )
M L S E R V I C E S
Reinforcement learning
Algorithms & models ( A W S M A R K E T P L A C E
F O R M A C H I N E L E A R N I N G )
Notebook Hosting
Hyper Parameter Optimization

Analytics solutions
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Machine Learning solutions
AI Services ML Services
ML Frameworks and
Infrastructure

Getting Started:
• Start Small, build upon successes
• Use MVP principles incrementally building
• Build Loosely/De-coupled solutions
• Pick the right tool for the right job
• Based on business question
• Users
• Data
• Leverage Managed/Serverless solutions

Questions?

Leveraging Cloud Analytics to Support Data-Driven Decisions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Leveraging Cloud Analytics to Support Data-Driven Decisions

Similar to Leveraging Cloud Analytics to Support Data-Driven Decisions (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Leveraging Cloud Analytics to Support Data-Driven Decisions

Editor's Notes