2. Data silos to
OLTP ERP CRM LOB
DW Silo 1
Business
Intelligence
Devices Web Sensors Social
DW Silo 2
Business
Intelligence Machine
learning
BI +
analytics
Data
warehousing
Data lakes
Open formats
Central catalog
Traditional data warehousing approaches don’t scale
3. AWS Confidential - Internal Use Only
Customers moving to data lake architectures
Bringing together the best of both worlds
Extends or evolves DW architectures
Store any data in any format
Durable, available, and exabyte scale
Secure, compliant, auditable
Run any type of analytics from DW to Predictive
Data
Warehousing
Analytics Machine
Learning
Data lake
4. AWS Confidential - Internal Use Only
Any type of analytics on the data lake
Most comprehensive analytics platform
Amazon S3 | AWS Glue
Lake Formation
Data lake
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon
Elasticsearch
Service
Amazon
Kinesis
Amazon
MSK
Amazon
SageMaker
Amazon
Personalize
Amazon
QuickSight
AWS Data
Exchange
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data
Exchange
5. AWS Confidential - Internal Use Only
Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts
6. Amazon S3 is the best place to build data lakes
Business insights
into your data
Best security,
compliance, and
audit capabilities
Most ways to
bring data in
Object-level
controls
Unmatched
durability,
availability, and
scalability
7. AWS Access Analyzer for S3—New
An S3 capability to generate comprehensive findings if your
resource policies grant public or cross-account access
Continuously identify resources with overly broad permissions
across your entire AWS organization
Resolve findings by updating policies to protect your
resources from unintended access before it occurs, or archive
findings for intended access
Access Analyzer for S3
8. Benefits of Access Analyzer for S3
Uses automated reasoning, a form
of mathematical logic & inference,
to determine all possible access
paths allowed by a resource policy
Continuously monitors and
automatically analyzes any new or
updated resource policy to help
you understand potential
security implications
Analyzes thousands of policies in
seconds for public or cross-
account access
12. S3 Replication
Amazon Simple Storage
Service
Replicate within the same
AWS Region
Replicate to a
different AWS Region
Replicate to a bucket with
retention controls (in the same
or different AWS Region)
Replicate faster to a different
AWS Region, backed by an SLA
+ replication metrics
New at re:Invent: S3 RTC
New in 2019 New in 2019
14. Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
15. AWS Confidential - Internal Use Only
Amazon EMR
Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS
Low cost
50–80% reduction in costs with
EC2 Spot and Reserved Instances
Per-second billing for flexibility
Use S3 storage
Process data in S3
securely with high performance
using the EMRFS connector
Latest versions
Updated with latest open source
frameworks within 30 days
Fully managed no cluster
setup, node provisioning,
cluster tuning
Easy
16. AWS Confidential - Internal Use Only
Performance Improvements in Spark for Amazon EMR
Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost
*Based on TPC-DS 3 TB Benchmarking running 6 node
C4x8 extra large clusters and EMR 5.28, Spark 2.4
10,164
16,478
26,478
0 5,000 10,000 15,000 20,000 25,000 30,000
Spark with EMR (with runtime)
3rd party Managed Spark (with their
runtime)
Spark with EMR (without runtime)
Runtime total on 104 queries
(seconds—lower is better)
Runtime optimized for Apache Spark performance
100% compliant with Apache Spark APIs
Best performance
2.6x faster than Spark with EMR without runtime
1.6x faster than 3rd party Managed Spark (with their runtime)
Lowest price
1/10th the cost of 3rd party Managed Spark (with their runtime)
NEW
17. AWS Confidential - Internal Use Only
Amazon EMR on AWS Outposts
Launch EMR in your data centers with AWS Outpost
Integrate with existing on-premises Hadoop deployments
Deploy secure, managed, EMR clusters in minutes
Process and analyze data on-premises on AWS Outpost
EMR
Hadoop + Spark
AWS
Outposts
On-premises
Hadoop/Spark
GA
NEW
18. AWS Confidential - Internal Use Only
Operational Analytics: Amazon Elasticsearch Service
Fully managed, scalable, secure, Elasticsearch service
Open source Elasticsearch
APIs, Kibana, and
Logstash
Open-source Elasticsearch APIs
Managed Kibana
Integration with Logstash
Scale clusters up/down via a
single API call or a few clicks
Secured network isolation
with VPC, encrypt data
at-rest and in-transit
Compliant: HIPPA, PCI DSS,
and ISO
Scalable, secure,
and compliant
Pay only for
what you use
Cost-optimized workloads
No upfront fee or
usage requirement
Critical features built-in:
encryption, VPC support,
24x7 monitoring
Fully managed
Deploy Elasticsearch clusters
in minutes: simplified hardware
provisioning, software
installation/patching, failure
recovery, backups, and monitoring
19. AWS Confidential - Internal Use Only
Challenges with analyzing high volumes of data in real-time
Storing data is
expensive at scale
Limits the amount of
data retained for analysis
Miss out on
valuable insights
28. AWS Confidential - Internal Use Only
Generate predictions directly from Aurora queries
Models run in SageMaker & Comprehend
Use standard SQL, no ML expertise required
Suitable for low-latency, high-volume use cases
Amazon
SageMaker
ML
Aurora
Database
Athena
Interactive
analytics
SQL
Select
From
Where
ML in Amazon Aurora and Athena
Bringing machine learning to data developers and data analysts
29. USING FUNCTION predict_customer_registration(age INTEGER)
RETURNS DOUBLE TYPE
SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = 'xgboost-2019-09-20-04-49-29-303’)
SELECT predict_customer_registration(age) AS probability_of_enrolling, customer_id
FROM "sampledb"."ml_test_dataset"
WHERE predict_customer_registration(age) < 0.5;
Example Query Using ML with Athena
30. Athena support for interface endpoint (PrivateLink)
Submit queries securely
No internet gateway required in your VPC
Secure communication between your VPC and Athena APIs
Set VPC endpoint policies
Example endpoint policy
31. AWS Confidential - Internal Use Only
Amazon QuickSight
First BI service built for the cloud with pay-per-session pricing & ML insights for everyone
Elastic Scaling
Auto-scale 10 to 10K+
users in minutes
Pay-as-you-go
Serverless
Create dashboards in
minutes
Deploy globally
without provisioning a
single server
Deeply integrated
with AWS services
Secure, Private access to
AWS data
Integrated S3 data lake
permissions through AWS IAM
API Support
Programmatically onboard users
and manage content
Easily embed in your apps
NEW
32. AWS Confidential - Internal Use Only
ML predictions in Amazon QuickSight (preview)
AWS/On-premise data sources
• Excel
• CSV
• MySQL
• PostgreSQL
• Maria DB
• Presto
• Spark
• SQL Server
• Amazon
Redshift
• RDS
• S3
• Athena
• Aurora
• EMR
• Snowflake
• Teradata
• Salesforce
• Square
• Adobe
Analytics
• Jira
• ServiceNow
• Twitter
• GitHub
1 Connect to any data:
Data lakes, SQL engines, 3rd
party applications and on-
premises databases
2 Select an ML model:
Create models with Amazon
SageMaker AutoPilot, existing
custom models and packaged
models from AWS Marketplace.
Custom
Models
QuickSight
Amazon
SageMaker
AutoPilot
Models
AWS
Marketplace
3 Visualize and share:
Analyze results, create
visualizations, build dashboards
/ email reports and share to
business stakeholders
NEW
33. AWS Confidential - Internal Use Only
Easily embed analytics in your own tools
Poweredby QuickSight APIs and flexible customization. Entirely serverless.
Deploy and manage dashboards + data via APIs
Match your application UI with QuickSight Themes
Embed dashboards in apps without servers
• Fast, consistent performance
• Pay-per-session
Automatically scale to 10s of 1000s of users
• No server management
• No scripting
NEW
34. Innovation is limited when you can’t find the data you need,
Subscriber
No good place to
find diverse data
?
Places
?
Genomics
?
Market Data
?
News
Provider
Hard to reach subscribers
?
Subscribers
even when there is a data provider that wants to share it
with you
35. It’s simply too hard for organizations to exchange data today
Subscriber
!
Months to
license
!
Difficult to
integrate
!
Hard drives, FTP,
Postal service
!
Inconsistent
payments
!
Complicated
contracts
!
Customer
acquisition cost
Provider
Fewer data
products in market
Frustrated
subscribers
39. AWS Confidential - Internal Use Only
Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts