SlideShare a Scribd company logo
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introduction to Data Lake on AWS
Tuan Vo
Solutions Architect
mintuan@amazon.com
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Organizations that successfully generate
business value from their data, will outperform
their peers. “
To Become a Leader, Data is Your Differentiator
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Data to Be a Differentiator, Customers Need
to Be Able to…
• Capture and store new non-relational
data at PB-EB scale in real time
• New type of analytics that go beyond
batch reporting to incorporate real-time,
predictive, voice, and image recognition
• Democratize access to data in a secure
and governed way
New types of analytics
Dashboards Predictive Image
Recognition
Voice
Real-time
New types of data
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditionally, Analytics Used to Look Like This
OLTP ERP CRM LOB
Data Warehouse
Business Intelligence • Relational data
• TBs–PBs scale
• Schema defined prior to data load
• Operational reporting and ad hoc
• Large initial CAPEX + $10K–$50K/TB/Year
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Data Warehouse
Business Intelligence
OLTP ERP CRM LOB
• Relational and non-relational data
• TBs–EBs scale
• Diverse analytical engines
• Low-cost storage & analytics
Devices Web Sensors Social
Big Data processing,
real-time, Machine Learning
Data Lake
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes and Analytics from AWS
Cost-effective
Scalable and durable
Secure
Open and comprehensive
Analytics
Machine Learning
Real-time Data
Movement
On-premises
Data Movement
Data Lake
on AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S 3
Amazon Gl ac ier
AWS Gl u e
Store Data in the Format You Want
Open and comprehensive
• Store data in the format you want:
• Text files like CSV
• Columnar like Apache Parquet, and Apache ORC
• Logstash like Grok
• JSON (simple, nested), AVRO
• And more…
CSV
ORC
Grok
Avro
Parquet
JSON
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyze with the Broadest Set of Analytic Tools
Open and comprehensive
• Analyze data with the broadest selection
of analytics tools
• Data warehousing
• Interactive SQL queries
• Big Data processing
• Real-time analytics
• Dashboards & Visualizations
• Machine Learning
• Query in place without moving to a
separate analytics system
• Up to 400% faster with S3 Select and
Glacier Select
• Largest ISV ecosystem with built-in
integration
• Ensures you can meet existing and future
use cases, minimizing risks
Amazon SageMaker
AWS Deep Learning AMIs
Amazon Rekognition
Amazon Lex
AWS DeepLens
Amazon Comprehend
Amazon Translate
Amazon Transcribe
Amazon Polly
Amazon Athena
Amazon EMR
Amazon Redshift
Amazon Elasticsearch service
Amazon Kinesis
Amazon QuickSight
Analytics
Machine
Learning
Amazon S 3
Amazon Gl ac ier
AWS Gl u e
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes from AWS
Data Lake
on AWS
Cost-effective
Scalable and durable
Secure
Open and comprehensive
Analytics
Machine Learning
Real-time Data
Movement
On-premises
Data Movement
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Provides Highest Levels of Security
Secure
Compliance
AWS Artifact
Amazon Inspector
Amazon Cloud HSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
VPC
Encryption
AWS Certification Manager
AWS Key Management
Service
Encryption at rest
Encryption in transit
Bring your own keys, HSM
support
Identity
AWS IAM
AWS SSO
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customer need to have multiple levels of security, identity and access management,
encryption, and compliance to secure their data lake
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes from AWS
Data Lake
on AWS
Cost-effective
Scalable and durable
Secure
Open and comprehensive
Analytics
Machine Learning
Real-time Data
Movement
On-premises
Data Movement
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Any Scale
Scalable and durable
• S3 has trillions of objects and exabytes of data
• Built to store any amount of data
• Run analytic engines at largest scale by spinning
up any amount of compute resources in minutes
• Runs on the world’s largest global
cloud infrastructure
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unmatched Durability and Availability
Scalable and durable
• Designed to deliver 99.999999999% durability
• Geographic redundancy & automatic replication
• Store data in multiple data centers across 3 AZs
in a single region
• Seamlessly replicates data between any region
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes from AWS
Data Lake
on AWS
Lowest cost
Scalable and durable
Secure
Open and comprehensive
Analytics
Machine Learning
Real-time Data
Movement
On-premises
Data Movement
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tiered Storage to Optimize Price/Performance
Lowest Cost
• Tiered storage to optimize price/performance
• S3 Standard
• S3 Standard—Infrequent Access
• S3 One Zone—Infrequent Access
• Amazon Glacier
• Migrate between tiers based on lifecycle policies
• Store data at $0.023/GB/month with S3
• Store data at $0.004/GB/month with Glacier
S3
Standard
S3 Standard
Infrequent Access
S3 One Zone-IA
Glacier
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pay Only for the Resources You Use as you Scale
Lowest Cost
• Pay-as-you-go for the resources you consume
• As low as $0.05/GB scanned with Athena
Traditional approach leads to wasted capacity
Traditional: Rigid
AWS: Elastic
Capacity
Demand
Demand
Servers
Unmet demand
upset players
missed revenue
Excess capacity
wasted $$$
AWS approach: pay for the capacity you use
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lowest Total Cost of Ownership (TCO)
Cost-effective
• Less admin time to
manage, and support
• No up-front costs—
hardware acquisition,
installation
• Save on operating
costs—data center space,
power, cooling
• Business value: cost of
delays, risk premium,
competitive abilities,
governance, etc.
Licensing Fees
Support Costs
Subscription Fee
Support Costs
On-premises AWS
Server Costs
Hardware—Server, Rack, Chassis,
PDUs, Tor Switches (+Maintenance)
Software—OS, Virtualization Licenses
(+Maintenance)
Network Costs
Network Hardware—LAN Switches,
Load Balancer Bandwidth costs
Software—Network Monitoring
IT Labor Costs
Server admin, virtualization admin,
storage admin, network admin,
support team
Extras
Project planning, advisors, legal,
contractors, managed services,
training, cost of capital
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More Data Lakes & Analytics on AWS than Anywhere Else
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IAM
Amazon CloudWatch AWS STS AWS CloudTrail
AWS KMS
Protect and secure
Machine
learning
Amazon QuickSight Amazon EMR
Amazon Redshift Amazon Athena
Processing and analytics
Amazon Kinesis
AWS
Direct Connect AWS Snowball
AWS DMS
AWS Data Exchange
Data ingestion
AWS Glue Amazon ES
Amazon DynamoDB
Catalog and search
Amazon API Gateway IAM Amazon Cognito
Access and user interface
Amazon S3
Central storage
Reference architecture:
Data lake on AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless data lakes and analytics
Amazon S3
AWS Glue
crawler
AWS Glue Data
Catalog
Amazon Athena Amazon QuickSight
Amazon RDS
Web app data
Other databases
On-premises data
Streaming data
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3—Object Storage
Security and
Compliance
Three different forms of
encryption; encrypts data
in transit when
replicating across regions;
log and monitor with
CloudTrail, use ML to
discover and protect
sensitive data with Macie
Flexible Management
Classify, report, and
visualize data usage
trends; objects can be
tagged to see storage
consumption, cost, and
security; build lifecycle
policies to automate
tiering, and retention
Durability, Availability
& Scalability
Built for eleven nine’s of
durability; data
distributed across 3
physical facilities in an
AWS region;
automatically replicated
to any other AWS region
Query in Place
Run analytics & ML on
data lake without data
movement; S3 Select can
retrieve subset of data,
improving analytics
performance by 400%
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier—Backup and Archive
Durability, Availability
& Scalability
Built for eleven nine’s of
durability; data
distributed across 3
physical facilities in an
AWS region;
automatically replicated
to any other AWS region
Secure
Log and monitor with
CloudTrail, Vault Lock
enables WORM storage
capabilities, helping
satisfy compliance
requirements
Retrieves data in
minutes
Three retrieval options to
fit your use case;
expedited retrievals with
Glacier Select can return
data in minutes
Inexpensive
Lowest cost AWS object
storage class, allowing
you to archive large
amounts of data at a very
low cost
$
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storing is Not Enough, Data Needs to Be Discoverable
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for
other purposes (for example,
analytics, business relationships
and direct monetizing).
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”
Gartner IT Glossary, 2018
https://www.gartner.com/it-glossary/dark-data
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue—Data Catalog
Make data discoverable
• Automatically discovers data and stores schema
• Catalog makes data searchable, and available for ETL
• Catalog contains table and job definitions
• Computes statistics to make queries efficient
Glue
Data Catalog
Discover data and
extract schema
Compliance
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Crawlers
Automatically discover new data, extracts schema definitions
• Detect schema changes and version tables
• Detect Hive style partitions on Amazon S3
Built-in classifiers for popular types; custom classifiers using Grok expressions
Run ad hoc or on a schedule; serverless – only pay when crawler runs
Crawlers automatically build your Data Catalog and keep it in sync
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue Data Catalog
Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single categorized
list that is searchable
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Table details
Table schema
Table properties
Data statistics
Nested fields
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Version control
List of table versions
Compare schema versions
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue—ETL Service
Make ETL scripting and deployment easy
• Automatically generates ETL code
• Code is customizable with Python
and Spark
• Endpoints provided to edit, debug,
test code
• Jobs are scheduled or event-based
• Serverless
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue DataBrew
N
EW
Clean and normalize data with a visual interface
250+ built-in transformations without writing code
Profile data to understand data patterns and anomalies
Work on large datasets at scale
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Example Query
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS service logs
Application logs
Data sourced from
external vendors
S3
Athena
Update table partition
Query data
S3
Athena CTAS and INSERT INTO
to ETL
Glue Data Catalog
Raw Data Transformed data
Amazon Athena: ETL & Query Use Case
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Quicksight
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Create Beautiful, Interactive
Dashboards
• Add rich interactivity like filters, drill downs,
zooming, and more
• Blazing fast navigation
• Accessible on any device
• Data Refresh
• Publish to everyone with a click
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML (Machine Learning) Insights
Cutting edge ML tools that automatically discover powerful insights for your users.
• Anomaly Detection
• Forecasting
• Bring your own model from
Amazon SageMaker
• Auto-generated natural language
narratives
*currently in preview
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU

More Related Content

What's hot

Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
Amazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
Amazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
Amazon Web Services
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
Capgemini
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform
Amazon Web Services
 
How Amazon uses AWS Analytics
How Amazon uses AWS AnalyticsHow Amazon uses AWS Analytics
How Amazon uses AWS Analytics
Amazon Web Services
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
Amazon Web Services
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
Amazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
Amazon Web Services
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Amazon Web Services
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Amazon Web Services
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
McGraw-Hill Optimizes Analytics Workloads with Databricks
 McGraw-Hill Optimizes Analytics Workloads with Databricks McGraw-Hill Optimizes Analytics Workloads with Databricks
McGraw-Hill Optimizes Analytics Workloads with Databricks
Amazon Web Services
 

What's hot (20)

Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform
 
How Amazon uses AWS Analytics
How Amazon uses AWS AnalyticsHow Amazon uses AWS Analytics
How Amazon uses AWS Analytics
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
McGraw-Hill Optimizes Analytics Workloads with Databricks
 McGraw-Hill Optimizes Analytics Workloads with Databricks McGraw-Hill Optimizes Analytics Workloads with Databricks
McGraw-Hill Optimizes Analytics Workloads with Databricks
 

Similar to Module 1 - CP Datalake on AWS

Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Amazon Web Services
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Amazon Web Services
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
 
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Amazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
AWS Riyadh User Group
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
Amazon Web Services LATAM
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
Amazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Amazon Web Services
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Amazon Web Services
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
Amazon Web Services
 
Migrating your IT - AWS Summit Cape Town 2018
Migrating your IT - AWS Summit Cape Town 2018Migrating your IT - AWS Summit Cape Town 2018
Migrating your IT - AWS Summit Cape Town 2018
Amazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
Cobus Bernard
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
Amazon Web Services
 

Similar to Module 1 - CP Datalake on AWS (20)

Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
Drive Customer Value with Data-Driven Decisions (GPSBUS206) - AWS re:Invent 2018
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
Building Serverless Analytics Solutions with Amazon QuickSight (ANT391) - AWS...
 
Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...Everything You Need to Know About Big Data: From Architectural Principles to ...
Everything You Need to Know About Big Data: From Architectural Principles to ...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
Migrating your IT - AWS Summit Cape Town 2018
Migrating your IT - AWS Summit Cape Town 2018Migrating your IT - AWS Summit Cape Town 2018
Migrating your IT - AWS Summit Cape Town 2018
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 

More from Lam Le

[Provided Data - Brazil] Hung Nguyen
[Provided Data - Brazil] Hung Nguyen[Provided Data - Brazil] Hung Nguyen
[Provided Data - Brazil] Hung Nguyen
Lam Le
 
[Provided Data - US] Hang Le
[Provided Data - US] Hang Le[Provided Data - US] Hang Le
[Provided Data - US] Hang Le
Lam Le
 
[Custom Data] Ngo Duy Vu
[Custom Data] Ngo Duy Vu[Custom Data] Ngo Duy Vu
[Custom Data] Ngo Duy Vu
Lam Le
 
[Provided Data - US] Khanh Ngo
[Provided Data - US] Khanh Ngo[Provided Data - US] Khanh Ngo
[Provided Data - US] Khanh Ngo
Lam Le
 
[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen
Lam Le
 
[Provided Data - US] Thien Tran
[Provided Data - US] Thien Tran[Provided Data - US] Thien Tran
[Provided Data - US] Thien Tran
Lam Le
 
[Custom Data] Hy Dang
 [Custom Data] Hy Dang [Custom Data] Hy Dang
[Custom Data] Hy Dang
Lam Le
 
[Provided Data - Brazil] Dương Hà Nguyễn Hoàng
 [Provided Data - Brazil] Dương Hà Nguyễn Hoàng [Provided Data - Brazil] Dương Hà Nguyễn Hoàng
[Provided Data - Brazil] Dương Hà Nguyễn Hoàng
Lam Le
 
[Custom Data] Ha Hoang
[Custom Data] Ha Hoang[Custom Data] Ha Hoang
[Custom Data] Ha Hoang
Lam Le
 
[Provided Data - US] Tran Chau
[Provided Data - US] Tran Chau[Provided Data - US] Tran Chau
[Provided Data - US] Tran Chau
Lam Le
 
[Provided Data - Brazil] Ethan Phan
[Provided Data - Brazil] Ethan Phan[Provided Data - Brazil] Ethan Phan
[Provided Data - Brazil] Ethan Phan
Lam Le
 
[Provided Data - US] ChiQuyen Dinh
 [Provided Data - US] ChiQuyen Dinh [Provided Data - US] ChiQuyen Dinh
[Provided Data - US] ChiQuyen Dinh
Lam Le
 
[Provided Data - US] Chi Cuong Nguyen
[Provided Data - US] Chi Cuong Nguyen[Provided Data - US] Chi Cuong Nguyen
[Provided Data - US] Chi Cuong Nguyen
Lam Le
 
[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen
Lam Le
 
[Provided Data - Brazil] Vuong.le
[Provided Data - Brazil] Vuong.le[Provided Data - Brazil] Vuong.le
[Provided Data - Brazil] Vuong.le
Lam Le
 
[Provided data - Brazil] Tran Manh Cuong
[Provided data - Brazil] Tran Manh Cuong[Provided data - Brazil] Tran Manh Cuong
[Provided data - Brazil] Tran Manh Cuong
Lam Le
 
[Custom data] Ngo Duy Vu
[Custom data] Ngo Duy Vu[Custom data] Ngo Duy Vu
[Custom data] Ngo Duy Vu
Lam Le
 
[Provided Data - US] Thao Phi
[Provided Data - US] Thao Phi[Provided Data - US] Thao Phi
[Provided Data - US] Thao Phi
Lam Le
 

More from Lam Le (18)

[Provided Data - Brazil] Hung Nguyen
[Provided Data - Brazil] Hung Nguyen[Provided Data - Brazil] Hung Nguyen
[Provided Data - Brazil] Hung Nguyen
 
[Provided Data - US] Hang Le
[Provided Data - US] Hang Le[Provided Data - US] Hang Le
[Provided Data - US] Hang Le
 
[Custom Data] Ngo Duy Vu
[Custom Data] Ngo Duy Vu[Custom Data] Ngo Duy Vu
[Custom Data] Ngo Duy Vu
 
[Provided Data - US] Khanh Ngo
[Provided Data - US] Khanh Ngo[Provided Data - US] Khanh Ngo
[Provided Data - US] Khanh Ngo
 
[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen
 
[Provided Data - US] Thien Tran
[Provided Data - US] Thien Tran[Provided Data - US] Thien Tran
[Provided Data - US] Thien Tran
 
[Custom Data] Hy Dang
 [Custom Data] Hy Dang [Custom Data] Hy Dang
[Custom Data] Hy Dang
 
[Provided Data - Brazil] Dương Hà Nguyễn Hoàng
 [Provided Data - Brazil] Dương Hà Nguyễn Hoàng [Provided Data - Brazil] Dương Hà Nguyễn Hoàng
[Provided Data - Brazil] Dương Hà Nguyễn Hoàng
 
[Custom Data] Ha Hoang
[Custom Data] Ha Hoang[Custom Data] Ha Hoang
[Custom Data] Ha Hoang
 
[Provided Data - US] Tran Chau
[Provided Data - US] Tran Chau[Provided Data - US] Tran Chau
[Provided Data - US] Tran Chau
 
[Provided Data - Brazil] Ethan Phan
[Provided Data - Brazil] Ethan Phan[Provided Data - Brazil] Ethan Phan
[Provided Data - Brazil] Ethan Phan
 
[Provided Data - US] ChiQuyen Dinh
 [Provided Data - US] ChiQuyen Dinh [Provided Data - US] ChiQuyen Dinh
[Provided Data - US] ChiQuyen Dinh
 
[Provided Data - US] Chi Cuong Nguyen
[Provided Data - US] Chi Cuong Nguyen[Provided Data - US] Chi Cuong Nguyen
[Provided Data - US] Chi Cuong Nguyen
 
[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen[Custom Data] Alice Nguyen
[Custom Data] Alice Nguyen
 
[Provided Data - Brazil] Vuong.le
[Provided Data - Brazil] Vuong.le[Provided Data - Brazil] Vuong.le
[Provided Data - Brazil] Vuong.le
 
[Provided data - Brazil] Tran Manh Cuong
[Provided data - Brazil] Tran Manh Cuong[Provided data - Brazil] Tran Manh Cuong
[Provided data - Brazil] Tran Manh Cuong
 
[Custom data] Ngo Duy Vu
[Custom data] Ngo Duy Vu[Custom data] Ngo Duy Vu
[Custom data] Ngo Duy Vu
 
[Provided Data - US] Thao Phi
[Provided Data - US] Thao Phi[Provided Data - US] Thao Phi
[Provided Data - US] Thao Phi
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Module 1 - CP Datalake on AWS

  • 1. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introduction to Data Lake on AWS Tuan Vo Solutions Architect mintuan@amazon.com
  • 2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Organizations that successfully generate business value from their data, will outperform their peers. “ To Become a Leader, Data is Your Differentiator
  • 3. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. For Data to Be a Differentiator, Customers Need to Be Able to… • Capture and store new non-relational data at PB-EB scale in real time • New type of analytics that go beyond batch reporting to incorporate real-time, predictive, voice, and image recognition • Democratize access to data in a secure and governed way New types of analytics Dashboards Predictive Image Recognition Voice Real-time New types of data
  • 4. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditionally, Analytics Used to Look Like This OLTP ERP CRM LOB Data Warehouse Business Intelligence • Relational data • TBs–PBs scale • Schema defined prior to data load • Operational reporting and ad hoc • Large initial CAPEX + $10K–$50K/TB/Year
  • 5. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Data Warehouse Business Intelligence OLTP ERP CRM LOB • Relational and non-relational data • TBs–EBs scale • Diverse analytical engines • Low-cost storage & analytics Devices Web Sensors Social Big Data processing, real-time, Machine Learning Data Lake
  • 6. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes and Analytics from AWS Cost-effective Scalable and durable Secure Open and comprehensive Analytics Machine Learning Real-time Data Movement On-premises Data Movement Data Lake on AWS
  • 7. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S 3 Amazon Gl ac ier AWS Gl u e Store Data in the Format You Want Open and comprehensive • Store data in the format you want: • Text files like CSV • Columnar like Apache Parquet, and Apache ORC • Logstash like Grok • JSON (simple, nested), AVRO • And more… CSV ORC Grok Avro Parquet JSON
  • 8. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analyze with the Broadest Set of Analytic Tools Open and comprehensive • Analyze data with the broadest selection of analytics tools • Data warehousing • Interactive SQL queries • Big Data processing • Real-time analytics • Dashboards & Visualizations • Machine Learning • Query in place without moving to a separate analytics system • Up to 400% faster with S3 Select and Glacier Select • Largest ISV ecosystem with built-in integration • Ensures you can meet existing and future use cases, minimizing risks Amazon SageMaker AWS Deep Learning AMIs Amazon Rekognition Amazon Lex AWS DeepLens Amazon Comprehend Amazon Translate Amazon Transcribe Amazon Polly Amazon Athena Amazon EMR Amazon Redshift Amazon Elasticsearch service Amazon Kinesis Amazon QuickSight Analytics Machine Learning Amazon S 3 Amazon Gl ac ier AWS Gl u e
  • 9. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes from AWS Data Lake on AWS Cost-effective Scalable and durable Secure Open and comprehensive Analytics Machine Learning Real-time Data Movement On-premises Data Movement
  • 10. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Provides Highest Levels of Security Secure Compliance AWS Artifact Amazon Inspector Amazon Cloud HSM Amazon Cognito AWS CloudTrail Security Amazon GuardDuty AWS Shield AWS WAF Amazon Macie VPC Encryption AWS Certification Manager AWS Key Management Service Encryption at rest Encryption in transit Bring your own keys, HSM support Identity AWS IAM AWS SSO Amazon Cloud Directory AWS Directory Service AWS Organizations Customer need to have multiple levels of security, identity and access management, encryption, and compliance to secure their data lake
  • 11. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes from AWS Data Lake on AWS Cost-effective Scalable and durable Secure Open and comprehensive Analytics Machine Learning Real-time Data Movement On-premises Data Movement
  • 12. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Any Scale Scalable and durable • S3 has trillions of objects and exabytes of data • Built to store any amount of data • Run analytic engines at largest scale by spinning up any amount of compute resources in minutes • Runs on the world’s largest global cloud infrastructure
  • 13. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Unmatched Durability and Availability Scalable and durable • Designed to deliver 99.999999999% durability • Geographic redundancy & automatic replication • Store data in multiple data centers across 3 AZs in a single region • Seamlessly replicates data between any region
  • 14. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes from AWS Data Lake on AWS Lowest cost Scalable and durable Secure Open and comprehensive Analytics Machine Learning Real-time Data Movement On-premises Data Movement
  • 15. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tiered Storage to Optimize Price/Performance Lowest Cost • Tiered storage to optimize price/performance • S3 Standard • S3 Standard—Infrequent Access • S3 One Zone—Infrequent Access • Amazon Glacier • Migrate between tiers based on lifecycle policies • Store data at $0.023/GB/month with S3 • Store data at $0.004/GB/month with Glacier S3 Standard S3 Standard Infrequent Access S3 One Zone-IA Glacier
  • 16. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pay Only for the Resources You Use as you Scale Lowest Cost • Pay-as-you-go for the resources you consume • As low as $0.05/GB scanned with Athena Traditional approach leads to wasted capacity Traditional: Rigid AWS: Elastic Capacity Demand Demand Servers Unmet demand upset players missed revenue Excess capacity wasted $$$ AWS approach: pay for the capacity you use
  • 17. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lowest Total Cost of Ownership (TCO) Cost-effective • Less admin time to manage, and support • No up-front costs— hardware acquisition, installation • Save on operating costs—data center space, power, cooling • Business value: cost of delays, risk premium, competitive abilities, governance, etc. Licensing Fees Support Costs Subscription Fee Support Costs On-premises AWS Server Costs Hardware—Server, Rack, Chassis, PDUs, Tor Switches (+Maintenance) Software—OS, Virtualization Licenses (+Maintenance) Network Costs Network Hardware—LAN Switches, Load Balancer Bandwidth costs Software—Network Monitoring IT Labor Costs Server admin, virtualization admin, storage admin, network admin, support team Extras Project planning, advisors, legal, contractors, managed services, training, cost of capital
  • 18. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. More Data Lakes & Analytics on AWS than Anywhere Else
  • 19. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IAM Amazon CloudWatch AWS STS AWS CloudTrail AWS KMS Protect and secure Machine learning Amazon QuickSight Amazon EMR Amazon Redshift Amazon Athena Processing and analytics Amazon Kinesis AWS Direct Connect AWS Snowball AWS DMS AWS Data Exchange Data ingestion AWS Glue Amazon ES Amazon DynamoDB Catalog and search Amazon API Gateway IAM Amazon Cognito Access and user interface Amazon S3 Central storage Reference architecture: Data lake on AWS
  • 20. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless data lakes and analytics Amazon S3 AWS Glue crawler AWS Glue Data Catalog Amazon Athena Amazon QuickSight Amazon RDS Web app data Other databases On-premises data Streaming data
  • 21. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3
  • 22. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3—Object Storage Security and Compliance Three different forms of encryption; encrypts data in transit when replicating across regions; log and monitor with CloudTrail, use ML to discover and protect sensitive data with Macie Flexible Management Classify, report, and visualize data usage trends; objects can be tagged to see storage consumption, cost, and security; build lifecycle policies to automate tiering, and retention Durability, Availability & Scalability Built for eleven nine’s of durability; data distributed across 3 physical facilities in an AWS region; automatically replicated to any other AWS region Query in Place Run analytics & ML on data lake without data movement; S3 Select can retrieve subset of data, improving analytics performance by 400%
  • 23. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Glacier—Backup and Archive Durability, Availability & Scalability Built for eleven nine’s of durability; data distributed across 3 physical facilities in an AWS region; automatically replicated to any other AWS region Secure Log and monitor with CloudTrail, Vault Lock enables WORM storage capabilities, helping satisfy compliance requirements Retrieves data in minutes Three retrieval options to fit your use case; expedited retrievals with Glacier Select can return data in minutes Inexpensive Lowest cost AWS object storage class, allowing you to archive large amounts of data at a very low cost $
  • 24. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue
  • 25. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Storing is Not Enough, Data Needs to Be Discoverable Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). CRM ERP Data warehouse Mainframe data Web Social Log files Machine data Semi- structured Unstructured “ ” Gartner IT Glossary, 2018 https://www.gartner.com/it-glossary/dark-data
  • 26. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue—Data Catalog Make data discoverable • Automatically discovers data and stores schema • Catalog makes data searchable, and available for ETL • Catalog contains table and job definitions • Computes statistics to make queries efficient Glue Data Catalog Discover data and extract schema Compliance
  • 27. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Crawlers Automatically discover new data, extracts schema definitions • Detect schema changes and version tables • Detect Hive style partitions on Amazon S3 Built-in classifiers for popular types; custom classifiers using Grok expressions Run ad hoc or on a schedule; serverless – only pay when crawler runs Crawlers automatically build your Data Catalog and keep it in sync
  • 28. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue Data Catalog Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single categorized list that is searchable
  • 29. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Table details Table schema Table properties Data statistics Nested fields
  • 30. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Catalog: Version control List of table versions Compare schema versions
  • 31. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue—ETL Service Make ETL scripting and deployment easy • Automatically generates ETL code • Code is customizable with Python and Spark • Endpoints provided to edit, debug, test code • Jobs are scheduled or event-based • Serverless
  • 32. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue DataBrew N EW Clean and normalize data with a visual interface 250+ built-in transformations without writing code Profile data to understand data patterns and anomalies Work on large datasets at scale
  • 33. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena
  • 34. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Athena Example Query
  • 35. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS service logs Application logs Data sourced from external vendors S3 Athena Update table partition Query data S3 Athena CTAS and INSERT INTO to ETL Glue Data Catalog Raw Data Transformed data Amazon Athena: ETL & Query Use Case
  • 36. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Quicksight
  • 37. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Create Beautiful, Interactive Dashboards • Add rich interactivity like filters, drill downs, zooming, and more • Blazing fast navigation • Accessible on any device • Data Refresh • Publish to everyone with a click
  • 38. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML (Machine Learning) Insights Cutting edge ML tools that automatically discover powerful insights for your users. • Anomaly Detection • Forecasting • Bring your own model from Amazon SageMaker • Auto-generated natural language narratives *currently in preview
  • 39. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. THANK YOU