© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Roy Ben-Alta, Head Of WW Data Analytics Practice at AWS
November 2019
Modern Data Platforms
Thinking Data Flywheel on the Cloud
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Flywheel
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Data Flywheel
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Cloud Was Built for Data Analytics and ML
Agility: Try more, fail fast, go
big or start small, and
process data at any scale
Scalability: Run jobs any
time, without guessing
capacity or limiting
functionality
Get to Insights Faster:
Focus on data science not the
heavy undifferentiated lift of
managing raw data
Broadest and Deepest
Capabilities: Numerous
serverless & managed Big
Data services to address
any workload
Low Cost: Pay only for
what you use, when you
use it
Data Migrations Made Easy:
Move exabyte-scale data to
the cloud quickly and cost-
effectively
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Built on top of open source
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
data problem: high level solution approach
Problem statement AWS solution approach
data is undiscovered into on-premises
historians
Build a consistent and open datastore for all your
sites using flexible ingestion tools
customers want to use
AI/ML, but are usually early in their
data journey
Deploy hosted and managed future proof tools as
you move from descriptive to preventive to
predictive analytics
data has gravity but often untapped Democratize access to the data for actionable
operational insights
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You need a Data Lake
Data from
Sources
Customer
and Insights
1
2
4
5
Collect and store any data at any scale and
at low costs
One home for all data – no silos
Right tool for the job
Democratize access to the data as per security
entitlements
3 Data security is #1 priority 6 Connect to internal and external apps
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key ‘Components’ of a Data Lake on AWS
Athena
Query Service
Batch GlueIoT Lambda SageMaker
Glue Catalog
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storing is not enough, data needs to be discoverable
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for other
purposes (for example, analytics,
business relationships and direct
monetizing).
Gartner
Traditional
enterprise
data
Aka as Big data
Dark data
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”
Source: https://www.gartner.com/it-glossary/dark-data
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
There’s a service for that: AWS Lake Formation
Data Lake Storage
Data
Catalog
Access
Control
Data import
Lake Formation
Crawlers ML-based
data prep
Use Amazon S3 as the
storage layer for
Lake Formation
Ask Lake Formation
to create required
S3 buckets and
import data into them
Register existing
S3 buckets that
contain your data
Data is stored in
your account:
You have direct access. No
lock-in.
Amazon S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Does It Work
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMR: Big Data Toolbox
Amazon EMR provides a managed
Hadoop, Spark and more framework
that makes it easy, fast, and cost-
effective to process and train vast
amounts of data on dynamically scalable
instance fleets.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Easy & Cost Effective
Spin up clusters in minutes
Easy
• Launch Hadoop & Spark clusters in minutes
• No need to install or maintain Hadoop
• Cluster tuning, and configuration done for you
• Latest Hadoop versions within 30 days of release
Cost Effective
• EMR provides 57% reduced costs vs. on premise in Year 1
• 342% ROI over 5 years
• New design patterns for storage, transient clusters, spot instances
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon SageMaker:
Build, train, and deploy ML models at scale
Collect and prepare
training data
Choose and optimize
your
ML algorithm
1
2
3
Set up and
manage
environments
for training
Train and
Tune ML Models
Deploy models
in production
Scale and manage
the production
environment
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Web/App
Classic
• Traditional databases, data warehousing
• Historical data, logs, archives
• Big data
Industry
• Production facilities
• Control devices
• IoT-Sensors
• Websites, web apps, mobile apps
• Enterprise applications (BI, CRM, etc.)
• External data (partners, weather, traffic, etc.)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming
Import/Batch
Web/App
Classic
Industry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Web/App
Classic
Industry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Encryption
Data catalog
classification
Access
control
Web/App
Classic
Industry
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Web/App
Classic
Industry
ETL
Pre-processing
Machine learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Web/App
Classic
Industry
ETL
Pre-processing
Machine learning Monitoring
Control
Applications
API
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
Building a data lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
On-premises
Data Movement
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
Amazon Managed Service for Kafka
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Real-time
Data Movement
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
Amazon Managed Service for Kafka
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Real-time
Data Movement
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis Data Analytics
Amazon QuickSight
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake
Streaming
Import/Batch
AWS Direct Connect
AWS Snowball
AWS Snowmobile
AWS Database Migration Service
Amazon Managed Service for Kafka
Amazon Kinesis Data Firehose
Amazon Kinesis Data Streams
Amazon Kinesis Video Streams
On-premises
Data Movement
Real-time
Data Movement
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch Service
Amazon Kinesis Data Analytics
Amazon QuickSight
Analytics
Amazon SageMaker
Amazon Rekognition
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Etc.
Machine
Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customer Case Study
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Customers running data analytics and ML on AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved
On-Premises
Data Center
Archival
Processing
Amazon S3AWS Storage Gateway
AWS DataSync
AWS Transfer for SFTP
Hybrid Cloud Storage
Analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank You
benaltar@amazon.com
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS data and analytic services
Any analytic workload, any scale, at the lowest possible cost
Insights
Analytics
Data Lake
Data Movement
Amazon
QuickSight
Amazon
SageMaker
AWS Glue ETL
Amazon S3/Amazon Glacier (Storage)
Amazon Redshift Amazon EMR Amazon Athena
Amazon Elasticsearch Service Amazon Kinesis Data Analytics
AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose Amazon Kinesis
Data Streams
Real-time
Amazon
Comprehend
Data
Warehouse
data processing Interactive
Amazon
Rekognition
Metadata &
Governance
AWS Lake Formation / AWS Glue Data Catalog

Modern Data Platforms - Thinking Data Flywheel on the Cloud

  • 1.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved Roy Ben-Alta, Head Of WW Data Analytics Practice at AWS November 2019 Modern Data Platforms Thinking Data Flywheel on the Cloud
  • 2.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon Flywheel
  • 3.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. The Data Flywheel
  • 4.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. The Cloud Was Built for Data Analytics and ML Agility: Try more, fail fast, go big or start small, and process data at any scale Scalability: Run jobs any time, without guessing capacity or limiting functionality Get to Insights Faster: Focus on data science not the heavy undifferentiated lift of managing raw data Broadest and Deepest Capabilities: Numerous serverless & managed Big Data services to address any workload Low Cost: Pay only for what you use, when you use it Data Migrations Made Easy: Move exabyte-scale data to the cloud quickly and cost- effectively
  • 5.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Built on top of open source
  • 6.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. data problem: high level solution approach Problem statement AWS solution approach data is undiscovered into on-premises historians Build a consistent and open datastore for all your sites using flexible ingestion tools customers want to use AI/ML, but are usually early in their data journey Deploy hosted and managed future proof tools as you move from descriptive to preventive to predictive analytics data has gravity but often untapped Democratize access to the data for actionable operational insights
  • 7.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. You need a Data Lake Data from Sources Customer and Insights 1 2 4 5 Collect and store any data at any scale and at low costs One home for all data – no silos Right tool for the job Democratize access to the data as per security entitlements 3 Data security is #1 priority 6 Connect to internal and external apps
  • 8.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Key ‘Components’ of a Data Lake on AWS Athena Query Service Batch GlueIoT Lambda SageMaker Glue Catalog
  • 9.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Storing is not enough, data needs to be discoverable Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner Traditional enterprise data Aka as Big data Dark data CRM ERP Data warehouse Mainframe data Web Social Log files Machine data Semi- structured Unstructured “ ” Source: https://www.gartner.com/it-glossary/dark-data
  • 10.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. There’s a service for that: AWS Lake Formation Data Lake Storage Data Catalog Access Control Data import Lake Formation Crawlers ML-based data prep Use Amazon S3 as the storage layer for Lake Formation Ask Lake Formation to create required S3 buckets and import data into them Register existing S3 buckets that contain your data Data is stored in your account: You have direct access. No lock-in. Amazon S3
  • 11.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. How Does It Work
  • 12.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EMR: Big Data Toolbox Amazon EMR provides a managed Hadoop, Spark and more framework that makes it easy, fast, and cost- effective to process and train vast amounts of data on dynamically scalable instance fleets.
  • 13.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Easy & Cost Effective Spin up clusters in minutes Easy • Launch Hadoop & Spark clusters in minutes • No need to install or maintain Hadoop • Cluster tuning, and configuration done for you • Latest Hadoop versions within 30 days of release Cost Effective • EMR provides 57% reduced costs vs. on premise in Year 1 • 342% ROI over 5 years • New design patterns for storage, transient clusters, spot instances
  • 14.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker: Build, train, and deploy ML models at scale Collect and prepare training data Choose and optimize your ML algorithm 1 2 3 Set up and manage environments for training Train and Tune ML Models Deploy models in production Scale and manage the production environment
  • 15.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Web/App Classic • Traditional databases, data warehousing • Historical data, logs, archives • Big data Industry • Production facilities • Control devices • IoT-Sensors • Websites, web apps, mobile apps • Enterprise applications (BI, CRM, etc.) • External data (partners, weather, traffic, etc.)
  • 16.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Streaming Import/Batch Web/App Classic Industry
  • 17.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Web/App Classic Industry
  • 18.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Encryption Data catalog classification Access control Web/App Classic Industry
  • 19.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Web/App Classic Industry ETL Pre-processing Machine learning
  • 20.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Web/App Classic Industry ETL Pre-processing Machine learning Monitoring Control Applications API
  • 21.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch Building a data lake
  • 22.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service On-premises Data Movement
  • 23.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service Amazon Managed Service for Kafka Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Real-time Data Movement
  • 24.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service Amazon Managed Service for Kafka Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Real-time Data Movement Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Data Analytics Amazon QuickSight Analytics
  • 25.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Data Lake Streaming Import/Batch AWS Direct Connect AWS Snowball AWS Snowmobile AWS Database Migration Service Amazon Managed Service for Kafka Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Amazon Kinesis Video Streams On-premises Data Movement Real-time Data Movement Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch Service Amazon Kinesis Data Analytics Amazon QuickSight Analytics Amazon SageMaker Amazon Rekognition Amazon Comprehend Amazon Translate Amazon Transcribe Etc. Machine Learning
  • 26.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Customer Case Study
  • 27.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Customers running data analytics and ML on AWS
  • 28.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved On-Premises Data Center Archival Processing Amazon S3AWS Storage Gateway AWS DataSync AWS Transfer for SFTP Hybrid Cloud Storage Analytics
  • 29.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 30.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Thank You benaltar@amazon.com
  • 31.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. AWS data and analytic services Any analytic workload, any scale, at the lowest possible cost Insights Analytics Data Lake Data Movement Amazon QuickSight Amazon SageMaker AWS Glue ETL Amazon S3/Amazon Glacier (Storage) Amazon Redshift Amazon EMR Amazon Athena Amazon Elasticsearch Service Amazon Kinesis Data Analytics AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Amazon Kinesis Data Firehose Amazon Kinesis Data Streams Real-time Amazon Comprehend Data Warehouse data processing Interactive Amazon Rekognition Metadata & Governance AWS Lake Formation / AWS Glue Data Catalog