SlideShare a Scribd company logo
1 of 40
Download to read offline
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Better Data Lakes Solution Using New Launched
AWS Analytics Services
Ivan Cheng (鄭志帆)
Manager, Solutions Architect, AWS
Data silos to
OLTP ERP CRM LOB
DW Silo 1
Business
Intelligence
Devices Web Sensors Social
DW Silo 2
Business
Intelligence Machine
learning
BI +
analytics
Data
warehousing
Data lakes
Open formats
Central catalog
Traditional data warehousing approaches don’t scale
AWS Confidential - Internal Use Only
Customers moving to data lake architectures
Bringing together the best of both worlds
Extends or evolves DW architectures
Store any data in any format
Durable, available, and exabyte scale
Secure, compliant, auditable
Run any type of analytics from DW to Predictive
Data
Warehousing
Analytics Machine
Learning
Data lake
AWS Confidential - Internal Use Only
Any type of analytics on the data lake
Most comprehensive analytics platform
Amazon S3 | AWS Glue
Lake Formation
Data lake
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon
Elasticsearch
Service
Amazon
Kinesis
Amazon
MSK
Amazon
SageMaker
Amazon
Personalize
Amazon
QuickSight
AWS Data
Exchange
Data
Warehousing
Big Data
Processing
Interactive
Query
Operational
Analytics
Real time
Analytics
Predictive
Analytics
RecommendationsVisualizations
Data
Exchange
AWS Confidential - Internal Use Only
Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts
Amazon S3 is the best place to build data lakes
Business insights
into your data
Best security,
compliance, and
audit capabilities
Most ways to
bring data in
Object-level
controls
Unmatched
durability,
availability, and
scalability
AWS Access Analyzer for S3—New
An S3 capability to generate comprehensive findings if your
resource policies grant public or cross-account access
Continuously identify resources with overly broad permissions
across your entire AWS organization
Resolve findings by updating policies to protect your
resources from unintended access before it occurs, or archive
findings for intended access
Access Analyzer for S3
Benefits of Access Analyzer for S3
Uses automated reasoning, a form
of mathematical logic & inference,
to determine all possible access
paths allowed by a resource policy
Continuously monitors and
automatically analyzes any new or
updated resource policy to help
you understand potential
security implications
Analyzes thousands of policies in
seconds for public or cross-
account access
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Access Points
Introducing
Simplify managing data access at scale for applications using shared data
sets on Amazon S3. Easily create hundreds of access points per bucket,
each with a unique name and permissions customized for each application.
DRAFTStorage
General Availability
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3 Access Points
How Access Points Work
my-bucket.s3.amazonaws.com
finance accounting sales
{ }{ }{ }
https://[access_point_name]-[accountID].s3-accesspoint.[region].amazonaws.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3 Replication
Amazon Simple Storage
Service
Replicate within the same
AWS Region
Replicate to a
different AWS Region
Replicate to a bucket with
retention controls (in the same
or different AWS Region)
Replicate faster to a different
AWS Region, backed by an SLA
+ replication metrics
New at re:Invent: S3 RTC
New in 2019 New in 2019
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3 Replication Time Control
Predictable
replication
time
Backed by an Amazon
S3 Service Level
Agreement (SLA)
Amazon S3 Replication with the benefits of...
Monitor replication
using Amazon
CloudWatch metrics
and event notifications
Designed to replicate 99.99% of data in <15 minutes
Storage
General Availability
Benefits of a Data Lake – Storage vs Compute
Separating your storage and compute
allows you to scale each component as
required
“How can I scale up with the
volume of data being generated?”
AWS Confidential - Internal Use Only
Amazon EMR
Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS
Low cost
50–80% reduction in costs with
EC2 Spot and Reserved Instances
Per-second billing for flexibility
Use S3 storage
Process data in S3
securely with high performance
using the EMRFS connector
Latest versions
Updated with latest open source
frameworks within 30 days
Fully managed no cluster
setup, node provisioning,
cluster tuning
Easy
AWS Confidential - Internal Use Only
Performance Improvements in Spark for Amazon EMR
Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost
*Based on TPC-DS 3 TB Benchmarking running 6 node
C4x8 extra large clusters and EMR 5.28, Spark 2.4
10,164
16,478
26,478
0 5,000 10,000 15,000 20,000 25,000 30,000
Spark with EMR (with runtime)
3rd party Managed Spark (with their
runtime)
Spark with EMR (without runtime)
Runtime total on 104 queries
(seconds—lower is better)
Runtime optimized for Apache Spark performance
100% compliant with Apache Spark APIs
Best performance
2.6x faster than Spark with EMR without runtime
1.6x faster than 3rd party Managed Spark (with their runtime)
Lowest price
1/10th the cost of 3rd party Managed Spark (with their runtime)
NEW
AWS Confidential - Internal Use Only
Amazon EMR on AWS Outposts
Launch EMR in your data centers with AWS Outpost
Integrate with existing on-premises Hadoop deployments
Deploy secure, managed, EMR clusters in minutes
Process and analyze data on-premises on AWS Outpost
EMR
Hadoop + Spark
AWS
Outposts
On-premises
Hadoop/Spark
GA
NEW
AWS Confidential - Internal Use Only
Operational Analytics: Amazon Elasticsearch Service
Fully managed, scalable, secure, Elasticsearch service
Open source Elasticsearch
APIs, Kibana, and
Logstash
Open-source Elasticsearch APIs
Managed Kibana
Integration with Logstash
Scale clusters up/down via a
single API call or a few clicks
Secured network isolation
with VPC, encrypt data
at-rest and in-transit
Compliant: HIPPA, PCI DSS,
and ISO
Scalable, secure,
and compliant
Pay only for
what you use
Cost-optimized workloads
No upfront fee or
usage requirement
Critical features built-in:
encryption, VPC support,
24x7 monitoring
Fully managed
Deploy Elasticsearch clusters
in minutes: simplified hardware
provisioning, software
installation/patching, failure
recovery, backups, and monitoring
AWS Confidential - Internal Use Only
Challenges with analyzing high volumes of data in real-time
Storing data is
expensive at scale
Limits the amount of
data retained for analysis
Miss out on
valuable insights
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UltraWarm for Amazon Elasticsearch Service
Introducing
A low cost, scalable warm storage tier for Amazon Elasticsearch Service. Store
up to 10 PB of data in a single cluster at 1/10th the cost of existing storage tiers,
while still providing an interactive experience for analyzing logs.
DRAFTAnalytics
Public Beta
LEARN MORE ANT229: Scalable, secure, and cost-effective log analytics
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UltraWarm: Cloud-optimized architecture for log analytics
Amazon S3
Active Backup Backup
UltraWarm UltraWarm UltraWarm
Data
Node
Data
Node
Data
Node
Data
Node
Kibana
Cost reduction of 90%
Interactive and integrated
log analytics
Multi-PB scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Logging example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ultrawarm Storage: Improved cost, scale, and durability
Data stored on S3
Eliminates Elasticsearch-level replicas and snapshots
Supports 100% utilization
Pay for the storage you use
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
UltraWarm Nodes: Optimized for performance
Optimized Nitro instances provide high bandwidth S3 access
Performance in numbers:
• Queries that hit cached data run like hot
• Queries that span many days/indexes of uncached data are up to 2x faster
than traditional HDD based warm instances
• Narrow queries on few days/indexes of uncached data finish in seconds
Multi-layered and granular caching, adaptive prefetching, and
query engine optimizations to provide interactive experience
Queries transparently run against locally cached or S3 data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena: Interactive query service
Pay per query
Pay only for queries run
Save 30–90% on per-query costs
through compression
Use S3 storage
ANSI SQL
JDBC/ODBC drivers
Multiple formats,
compression types, and
complex joins and data types
SQL
Serverless: zero infrastructure,
zero administration
Integrated with QuickSight
EasyQuery instantly
Zero setup cost
Point to S3 and start querying
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Athena Federated Query
Run SQL queries on data spanning multiple data stores
Redshift
Data warehousing
ElastiCache
Redis
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
DocumentDB
Document
S3/Glacier
Run connectors in AWS Lambda: no servers to manage
Run SQL queries on relational, non-relational, object,
or custom data sources; in the cloud or on-premises
Open Source Connectors for common data sources
Build connectors to custom data sources
Analytics
Preview
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Athena Federated Query Architecture
AWS Confidential - Internal Use Only
Generate predictions directly from Aurora queries
Models run in SageMaker & Comprehend
Use standard SQL, no ML expertise required
Suitable for low-latency, high-volume use cases
Amazon
SageMaker
ML
Aurora
Database
Athena
Interactive
analytics
SQL
Select
From
Where
ML in Amazon Aurora and Athena
Bringing machine learning to data developers and data analysts
USING FUNCTION predict_customer_registration(age INTEGER)
RETURNS DOUBLE TYPE
SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = 'xgboost-2019-09-20-04-49-29-303’)
SELECT predict_customer_registration(age) AS probability_of_enrolling, customer_id
FROM "sampledb"."ml_test_dataset"
WHERE predict_customer_registration(age) < 0.5;
Example Query Using ML with Athena
Athena support for interface endpoint (PrivateLink)
Submit queries securely
No internet gateway required in your VPC
Secure communication between your VPC and Athena APIs
Set VPC endpoint policies
Example endpoint policy
AWS Confidential - Internal Use Only
Amazon QuickSight
First BI service built for the cloud with pay-per-session pricing & ML insights for everyone
Elastic Scaling
Auto-scale 10 to 10K+
users in minutes
Pay-as-you-go
Serverless
Create dashboards in
minutes
Deploy globally
without provisioning a
single server
Deeply integrated
with AWS services
Secure, Private access to
AWS data
Integrated S3 data lake
permissions through AWS IAM
API Support
Programmatically onboard users
and manage content
Easily embed in your apps
NEW
AWS Confidential - Internal Use Only
ML predictions in Amazon QuickSight (preview)
AWS/On-premise data sources
• Excel
• CSV
• MySQL
• PostgreSQL
• Maria DB
• Presto
• Spark
• SQL Server
• Amazon
Redshift
• RDS
• S3
• Athena
• Aurora
• EMR
• Snowflake
• Teradata
• Salesforce
• Square
• Adobe
Analytics
• Jira
• ServiceNow
• Twitter
• GitHub
1 Connect to any data:
Data lakes, SQL engines, 3rd
party applications and on-
premises databases
2 Select an ML model:
Create models with Amazon
SageMaker AutoPilot, existing
custom models and packaged
models from AWS Marketplace.
Custom
Models
QuickSight
Amazon
SageMaker
AutoPilot
Models
AWS
Marketplace
3 Visualize and share:
Analyze results, create
visualizations, build dashboards
/ email reports and share to
business stakeholders
NEW
AWS Confidential - Internal Use Only
Easily embed analytics in your own tools
Poweredby QuickSight APIs and flexible customization. Entirely serverless.
Deploy and manage dashboards + data via APIs
Match your application UI with QuickSight Themes
Embed dashboards in apps without servers
• Fast, consistent performance
• Pay-per-session
Automatically scale to 10s of 1000s of users
• No server management
• No scripting
NEW
Innovation is limited when you can’t find the data you need,
Subscriber
No good place to
find diverse data
?
Places
?
Genomics
?
Market Data
?
News
Provider
Hard to reach subscribers
?
Subscribers
even when there is a data provider that wants to share it
with you
It’s simply too hard for organizations to exchange data today
Subscriber
!
Months to
license
!
Difficult to
integrate
!
Hard drives, FTP,
Postal service
!
Inconsistent
payments
!
Complicated
contracts
!
Customer
acquisition cost
Provider
Fewer data
products in market
Frustrated
subscribers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data exchange: AWS Data Exchange
Easily find and subscribe to 3rd-party data in the cloud
Efficiently access
3rd party data
Simplifies access to data: No
need to receive physical media,
manage FTP credentials, or
integrate with different APIs
Minimize legal reviews and
negotiations
Quickly find diverse
data in one place
>1,000 data products
>80 data providers including
include Dow Jones, Change
Healthcare, Foursquare, Dun
& Bradstreet, Thomson
Reuters, Pitney Bowes, Lexis
Nexis, and Deloitte
Easily analyze data
Download or copy data to S3
Combine, analyze, and model
with existing data
Analyze data with EMR,
Redshift, Athena, and AWS
Glue
Analytics
General Availability
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Quickly find diverse data
in one place
Efficiently access
3rd-party data
Easily analyze data
Reach millions of
AWS customers
Easiest way to package and
publish data products
Built-in security and
compliance controls
For
Subscribers
For
Providers
L E A R N M O R E
ANT238-R: AWS Data Exchange: Easily find & subscribe to third-party
data in the cloud
Data exchange: AWS Data Exchange
Easily find and subscribe to 3rd-party data in the cloud
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon AWS Data Exchange Data Providers
AWS Confidential - Internal Use Only
Our portfolio
Broad and deep portfolio, purpose-built for builders
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data Movement
Data Lake
Business Intelligence & Machine Learning
Data Exchange
Data exchange
NEW
QuickSight
Visualizations
SageMaker
ML
Comprehend
NLP
Transcribe
Speech-to-text
Textract
Extract text
Personalize
Recommendation
Forecast
Forecasts
Translate
Translation
CodeGuru
Code reviews
Kendra
Enterprise search
NEW NEW
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server,
RDS on VMware
Aurora
MySQL, PostgreSQL
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
Analytics Databases
Managed
Blockchain
Blockchain
Templates
Blockchain
Managed Apache
Cassandra Service
Wide column
NEW
DocumentDB
Document
Redshift
Data warehousing
EMR
Hadoop + Spark
Kinesis Data Analytics
Real time
Elasticsearch Service
Operational Analytics
Athena
Interactive analytics
NEW
NEW
NEW
NEW
NEW
AQUA EMR on Outposts
UltraWarm
RDS Proxy
RDS on Outposts
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ivan Cheng (鄭志帆)

More Related Content

What's hot

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudAmazon Web Services
 
Migrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSMigrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSAmazon Web Services
 
ENT207-The Future of Enterprise IT.pdf
ENT207-The Future of Enterprise IT.pdfENT207-The Future of Enterprise IT.pdf
ENT207-The Future of Enterprise IT.pdfAmazon Web Services
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Riyadh User Group
 
데이터 센터 모던화::임흥선::AWS Summit Seoul 2018
데이터 센터 모던화::임흥선::AWS Summit Seoul 2018데이터 센터 모던화::임흥선::AWS Summit Seoul 2018
데이터 센터 모던화::임흥선::AWS Summit Seoul 2018Amazon Web Services Korea
 
Machine Learning in azione con Amazon SageMaker
Machine Learning in azione con Amazon SageMakerMachine Learning in azione con Amazon SageMaker
Machine Learning in azione con Amazon SageMakerAmazon Web Services
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Run Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage OptionsRun Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage OptionsAmazon Web Services
 
ENT204 The AWS Cloud Value Framework
ENT204 The AWS Cloud Value FrameworkENT204 The AWS Cloud Value Framework
ENT204 The AWS Cloud Value FrameworkAmazon Web Services
 
Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018
Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018
Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018Amazon Web Services
 
AWSome Day Online 2020_Modul 3: Membangun di Cloud
AWSome Day Online 2020_Modul 3: Membangun di CloudAWSome Day Online 2020_Modul 3: Membangun di Cloud
AWSome Day Online 2020_Modul 3: Membangun di CloudAmazon Web Services
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Amazon Web Services Korea
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Innovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWSInnovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWSAmazon Web Services
 
Achieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech TalksAchieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech TalksAmazon Web Services
 

What's hot (20)

The Future of Enterprise IT
The Future of Enterprise IT The Future of Enterprise IT
The Future of Enterprise IT
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Migrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWSMigrate & Optimize Microsoft Applications on AWS
Migrate & Optimize Microsoft Applications on AWS
 
ENT207-The Future of Enterprise IT.pdf
ENT207-The Future of Enterprise IT.pdfENT207-The Future of Enterprise IT.pdf
ENT207-The Future of Enterprise IT.pdf
 
AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]AWS Technical Day Riyadh Nov 2019 [Migration]
AWS Technical Day Riyadh Nov 2019 [Migration]
 
데이터 센터 모던화::임흥선::AWS Summit Seoul 2018
데이터 센터 모던화::임흥선::AWS Summit Seoul 2018데이터 센터 모던화::임흥선::AWS Summit Seoul 2018
데이터 센터 모던화::임흥선::AWS Summit Seoul 2018
 
Machine Learning in azione con Amazon SageMaker
Machine Learning in azione con Amazon SageMakerMachine Learning in azione con Amazon SageMaker
Machine Learning in azione con Amazon SageMaker
 
Enterprise workloads on AWS
Enterprise workloads on AWSEnterprise workloads on AWS
Enterprise workloads on AWS
 
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA308 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Run Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage OptionsRun Your Workloads in New EC2 and Storage Options
Run Your Workloads in New EC2 and Storage Options
 
aws basics
aws basicsaws basics
aws basics
 
ENT204 The AWS Cloud Value Framework
ENT204 The AWS Cloud Value FrameworkENT204 The AWS Cloud Value Framework
ENT204 The AWS Cloud Value Framework
 
AWS & Cloud Foundations
AWS & Cloud FoundationsAWS & Cloud Foundations
AWS & Cloud Foundations
 
Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018
Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018
Keep Your Infrastructure Costs Low: AWS Startup Day - New York 2018
 
AWSome Day Online 2020_Modul 3: Membangun di Cloud
AWSome Day Online 2020_Modul 3: Membangun di CloudAWSome Day Online 2020_Modul 3: Membangun di Cloud
AWSome Day Online 2020_Modul 3: Membangun di Cloud
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Innovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWSInnovating SAP the Easy Way – Migrate it to AWS
Innovating SAP the Easy Way – Migrate it to AWS
 
Journey to the cloud.
Journey to the cloud.Journey to the cloud.
Journey to the cloud.
 
Achieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech TalksAchieving Business Value with AWS - AWS Online Tech Talks
Achieving Business Value with AWS - AWS Online Tech Talks
 

Similar to AWS 資料湖服務

Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleAmazon Web Services
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSLam Le
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 
AWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempiAWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempiAmazon Web Services
 
Data Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudData Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudIan Massingham
 
Journey Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisJourney Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 

Similar to AWS 資料湖服務 (20)

Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
AWS re:Invent Recap
AWS re:Invent RecapAWS re:Invent Recap
AWS re:Invent Recap
 
Module 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWSModule 1 - CP Datalake on AWS
Module 1 - CP Datalake on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
AWS Tech Talks - Data Lake Analytics
AWS Tech Talks - Data Lake AnalyticsAWS Tech Talks - Data Lake Analytics
AWS Tech Talks - Data Lake Analytics
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
AWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempiAWS Portfolio: highlight delle categorie di prodotti AWS con esempi
AWS Portfolio: highlight delle categorie di prodotti AWS con esempi
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Data Analysis - Journey Through the Cloud
Data Analysis - Journey Through the CloudData Analysis - Journey Through the Cloud
Data Analysis - Journey Through the Cloud
 
Journey Through the Cloud - Data Analysis
Journey Through the Cloud - Data AnalysisJourney Through the Cloud - Data Analysis
Journey Through the Cloud - Data Analysis
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 

AWS 資料湖服務

  • 1. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Better Data Lakes Solution Using New Launched AWS Analytics Services Ivan Cheng (鄭志帆) Manager, Solutions Architect, AWS
  • 2. Data silos to OLTP ERP CRM LOB DW Silo 1 Business Intelligence Devices Web Sensors Social DW Silo 2 Business Intelligence Machine learning BI + analytics Data warehousing Data lakes Open formats Central catalog Traditional data warehousing approaches don’t scale
  • 3. AWS Confidential - Internal Use Only Customers moving to data lake architectures Bringing together the best of both worlds Extends or evolves DW architectures Store any data in any format Durable, available, and exabyte scale Secure, compliant, auditable Run any type of analytics from DW to Predictive Data Warehousing Analytics Machine Learning Data lake
  • 4. AWS Confidential - Internal Use Only Any type of analytics on the data lake Most comprehensive analytics platform Amazon S3 | AWS Glue Lake Formation Data lake Amazon Redshift Amazon EMR Amazon Athena Amazon Elasticsearch Service Amazon Kinesis Amazon MSK Amazon SageMaker Amazon Personalize Amazon QuickSight AWS Data Exchange Data Warehousing Big Data Processing Interactive Query Operational Analytics Real time Analytics Predictive Analytics RecommendationsVisualizations Data Exchange
  • 5. AWS Confidential - Internal Use Only Our portfolio Broad and deep portfolio, purpose-built for builders S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Movement Data Lake Business Intelligence & Machine Learning Data Exchange Data exchange NEW QuickSight Visualizations SageMaker ML Comprehend NLP Transcribe Speech-to-text Textract Extract text Personalize Recommendation Forecast Forecasts Translate Translation CodeGuru Code reviews Kendra Enterprise search NEW NEW RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, RDS on VMware Aurora MySQL, PostgreSQL DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database Analytics Databases Managed Blockchain Blockchain Templates Blockchain Managed Apache Cassandra Service Wide column NEW DocumentDB Document Redshift Data warehousing EMR Hadoop + Spark Kinesis Data Analytics Real time Elasticsearch Service Operational Analytics Athena Interactive analytics NEW NEW NEW NEW NEW AQUA EMR on Outposts UltraWarm RDS Proxy RDS on Outposts
  • 6. Amazon S3 is the best place to build data lakes Business insights into your data Best security, compliance, and audit capabilities Most ways to bring data in Object-level controls Unmatched durability, availability, and scalability
  • 7. AWS Access Analyzer for S3—New An S3 capability to generate comprehensive findings if your resource policies grant public or cross-account access Continuously identify resources with overly broad permissions across your entire AWS organization Resolve findings by updating policies to protect your resources from unintended access before it occurs, or archive findings for intended access Access Analyzer for S3
  • 8. Benefits of Access Analyzer for S3 Uses automated reasoning, a form of mathematical logic & inference, to determine all possible access paths allowed by a resource policy Continuously monitors and automatically analyzes any new or updated resource policy to help you understand potential security implications Analyzes thousands of policies in seconds for public or cross- account access
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Access Points Introducing Simplify managing data access at scale for applications using shared data sets on Amazon S3. Easily create hundreds of access points per bucket, each with a unique name and permissions customized for each application. DRAFTStorage General Availability
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S3 Access Points How Access Points Work my-bucket.s3.amazonaws.com finance accounting sales { }{ }{ } https://[access_point_name]-[accountID].s3-accesspoint.[region].amazonaws.com
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. S3 Replication Amazon Simple Storage Service Replicate within the same AWS Region Replicate to a different AWS Region Replicate to a bucket with retention controls (in the same or different AWS Region) Replicate faster to a different AWS Region, backed by an SLA + replication metrics New at re:Invent: S3 RTC New in 2019 New in 2019
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Replication Time Control Predictable replication time Backed by an Amazon S3 Service Level Agreement (SLA) Amazon S3 Replication with the benefits of... Monitor replication using Amazon CloudWatch metrics and event notifications Designed to replicate 99.99% of data in <15 minutes Storage General Availability
  • 14. Benefits of a Data Lake – Storage vs Compute Separating your storage and compute allows you to scale each component as required “How can I scale up with the volume of data being generated?”
  • 15. AWS Confidential - Internal Use Only Amazon EMR Easily Run Spark, Hadoop, Hive, Presto, HBase, and more big data apps on AWS Low cost 50–80% reduction in costs with EC2 Spot and Reserved Instances Per-second billing for flexibility Use S3 storage Process data in S3 securely with high performance using the EMRFS connector Latest versions Updated with latest open source frameworks within 30 days Fully managed no cluster setup, node provisioning, cluster tuning Easy
  • 16. AWS Confidential - Internal Use Only Performance Improvements in Spark for Amazon EMR Performance optimized runtime for Apache Spark, 2.6x faster performance at 1/10th the cost *Based on TPC-DS 3 TB Benchmarking running 6 node C4x8 extra large clusters and EMR 5.28, Spark 2.4 10,164 16,478 26,478 0 5,000 10,000 15,000 20,000 25,000 30,000 Spark with EMR (with runtime) 3rd party Managed Spark (with their runtime) Spark with EMR (without runtime) Runtime total on 104 queries (seconds—lower is better) Runtime optimized for Apache Spark performance 100% compliant with Apache Spark APIs Best performance 2.6x faster than Spark with EMR without runtime 1.6x faster than 3rd party Managed Spark (with their runtime) Lowest price 1/10th the cost of 3rd party Managed Spark (with their runtime) NEW
  • 17. AWS Confidential - Internal Use Only Amazon EMR on AWS Outposts Launch EMR in your data centers with AWS Outpost Integrate with existing on-premises Hadoop deployments Deploy secure, managed, EMR clusters in minutes Process and analyze data on-premises on AWS Outpost EMR Hadoop + Spark AWS Outposts On-premises Hadoop/Spark GA NEW
  • 18. AWS Confidential - Internal Use Only Operational Analytics: Amazon Elasticsearch Service Fully managed, scalable, secure, Elasticsearch service Open source Elasticsearch APIs, Kibana, and Logstash Open-source Elasticsearch APIs Managed Kibana Integration with Logstash Scale clusters up/down via a single API call or a few clicks Secured network isolation with VPC, encrypt data at-rest and in-transit Compliant: HIPPA, PCI DSS, and ISO Scalable, secure, and compliant Pay only for what you use Cost-optimized workloads No upfront fee or usage requirement Critical features built-in: encryption, VPC support, 24x7 monitoring Fully managed Deploy Elasticsearch clusters in minutes: simplified hardware provisioning, software installation/patching, failure recovery, backups, and monitoring
  • 19. AWS Confidential - Internal Use Only Challenges with analyzing high volumes of data in real-time Storing data is expensive at scale Limits the amount of data retained for analysis Miss out on valuable insights
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. UltraWarm for Amazon Elasticsearch Service Introducing A low cost, scalable warm storage tier for Amazon Elasticsearch Service. Store up to 10 PB of data in a single cluster at 1/10th the cost of existing storage tiers, while still providing an interactive experience for analyzing logs. DRAFTAnalytics Public Beta LEARN MORE ANT229: Scalable, secure, and cost-effective log analytics
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. UltraWarm: Cloud-optimized architecture for log analytics Amazon S3 Active Backup Backup UltraWarm UltraWarm UltraWarm Data Node Data Node Data Node Data Node Kibana Cost reduction of 90% Interactive and integrated log analytics Multi-PB scale
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Logging example
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ultrawarm Storage: Improved cost, scale, and durability Data stored on S3 Eliminates Elasticsearch-level replicas and snapshots Supports 100% utilization Pay for the storage you use
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. UltraWarm Nodes: Optimized for performance Optimized Nitro instances provide high bandwidth S3 access Performance in numbers: • Queries that hit cached data run like hot • Queries that span many days/indexes of uncached data are up to 2x faster than traditional HDD based warm instances • Narrow queries on few days/indexes of uncached data finish in seconds Multi-layered and granular caching, adaptive prefetching, and query engine optimizations to provide interactive experience Queries transparently run against locally cached or S3 data
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena: Interactive query service Pay per query Pay only for queries run Save 30–90% on per-query costs through compression Use S3 storage ANSI SQL JDBC/ODBC drivers Multiple formats, compression types, and complex joins and data types SQL Serverless: zero infrastructure, zero administration Integrated with QuickSight EasyQuery instantly Zero setup cost Point to S3 and start querying
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Athena Federated Query Run SQL queries on data spanning multiple data stores Redshift Data warehousing ElastiCache Redis Aurora MySQL, PostgreSQL DynamoDB Key value, Document DocumentDB Document S3/Glacier Run connectors in AWS Lambda: no servers to manage Run SQL queries on relational, non-relational, object, or custom data sources; in the cloud or on-premises Open Source Connectors for common data sources Build connectors to custom data sources Analytics Preview
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Athena Federated Query Architecture
  • 28. AWS Confidential - Internal Use Only Generate predictions directly from Aurora queries Models run in SageMaker & Comprehend Use standard SQL, no ML expertise required Suitable for low-latency, high-volume use cases Amazon SageMaker ML Aurora Database Athena Interactive analytics SQL Select From Where ML in Amazon Aurora and Athena Bringing machine learning to data developers and data analysts
  • 29. USING FUNCTION predict_customer_registration(age INTEGER) RETURNS DOUBLE TYPE SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = 'xgboost-2019-09-20-04-49-29-303’) SELECT predict_customer_registration(age) AS probability_of_enrolling, customer_id FROM "sampledb"."ml_test_dataset" WHERE predict_customer_registration(age) < 0.5; Example Query Using ML with Athena
  • 30. Athena support for interface endpoint (PrivateLink) Submit queries securely No internet gateway required in your VPC Secure communication between your VPC and Athena APIs Set VPC endpoint policies Example endpoint policy
  • 31. AWS Confidential - Internal Use Only Amazon QuickSight First BI service built for the cloud with pay-per-session pricing & ML insights for everyone Elastic Scaling Auto-scale 10 to 10K+ users in minutes Pay-as-you-go Serverless Create dashboards in minutes Deploy globally without provisioning a single server Deeply integrated with AWS services Secure, Private access to AWS data Integrated S3 data lake permissions through AWS IAM API Support Programmatically onboard users and manage content Easily embed in your apps NEW
  • 32. AWS Confidential - Internal Use Only ML predictions in Amazon QuickSight (preview) AWS/On-premise data sources • Excel • CSV • MySQL • PostgreSQL • Maria DB • Presto • Spark • SQL Server • Amazon Redshift • RDS • S3 • Athena • Aurora • EMR • Snowflake • Teradata • Salesforce • Square • Adobe Analytics • Jira • ServiceNow • Twitter • GitHub 1 Connect to any data: Data lakes, SQL engines, 3rd party applications and on- premises databases 2 Select an ML model: Create models with Amazon SageMaker AutoPilot, existing custom models and packaged models from AWS Marketplace. Custom Models QuickSight Amazon SageMaker AutoPilot Models AWS Marketplace 3 Visualize and share: Analyze results, create visualizations, build dashboards / email reports and share to business stakeholders NEW
  • 33. AWS Confidential - Internal Use Only Easily embed analytics in your own tools Poweredby QuickSight APIs and flexible customization. Entirely serverless. Deploy and manage dashboards + data via APIs Match your application UI with QuickSight Themes Embed dashboards in apps without servers • Fast, consistent performance • Pay-per-session Automatically scale to 10s of 1000s of users • No server management • No scripting NEW
  • 34. Innovation is limited when you can’t find the data you need, Subscriber No good place to find diverse data ? Places ? Genomics ? Market Data ? News Provider Hard to reach subscribers ? Subscribers even when there is a data provider that wants to share it with you
  • 35. It’s simply too hard for organizations to exchange data today Subscriber ! Months to license ! Difficult to integrate ! Hard drives, FTP, Postal service ! Inconsistent payments ! Complicated contracts ! Customer acquisition cost Provider Fewer data products in market Frustrated subscribers
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data exchange: AWS Data Exchange Easily find and subscribe to 3rd-party data in the cloud Efficiently access 3rd party data Simplifies access to data: No need to receive physical media, manage FTP credentials, or integrate with different APIs Minimize legal reviews and negotiations Quickly find diverse data in one place >1,000 data products >80 data providers including include Dow Jones, Change Healthcare, Foursquare, Dun & Bradstreet, Thomson Reuters, Pitney Bowes, Lexis Nexis, and Deloitte Easily analyze data Download or copy data to S3 Combine, analyze, and model with existing data Analyze data with EMR, Redshift, Athena, and AWS Glue Analytics General Availability
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Quickly find diverse data in one place Efficiently access 3rd-party data Easily analyze data Reach millions of AWS customers Easiest way to package and publish data products Built-in security and compliance controls For Subscribers For Providers L E A R N M O R E ANT238-R: AWS Data Exchange: Easily find & subscribe to third-party data in the cloud Data exchange: AWS Data Exchange Easily find and subscribe to 3rd-party data in the cloud
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon AWS Data Exchange Data Providers
  • 39. AWS Confidential - Internal Use Only Our portfolio Broad and deep portfolio, purpose-built for builders S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka Data Movement Data Lake Business Intelligence & Machine Learning Data Exchange Data exchange NEW QuickSight Visualizations SageMaker ML Comprehend NLP Transcribe Speech-to-text Textract Extract text Personalize Recommendation Forecast Forecasts Translate Translation CodeGuru Code reviews Kendra Enterprise search NEW NEW RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, RDS on VMware Aurora MySQL, PostgreSQL DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database Analytics Databases Managed Blockchain Blockchain Templates Blockchain Managed Apache Cassandra Service Wide column NEW DocumentDB Document Redshift Data warehousing EMR Hadoop + Spark Kinesis Data Analytics Real time Elasticsearch Service Operational Analytics Athena Interactive analytics NEW NEW NEW NEW NEW AQUA EMR on Outposts UltraWarm RDS Proxy RDS on Outposts
  • 40. Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ivan Cheng (鄭志帆)