SlideShare a Scribd company logo
1 of 32
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Johan Broman
Manager, Solutions Architecture – AWS Nordics
Theo Hultberg
Director of Technology - Burt
Big Data and Data Lakes
Building Blocks
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Drives Better Decision Making
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Data lake leaders who were highly efficient
in capturing a diversity of data and making it
accessible to their organization in a timely
fashion outperformed their peers by 9% in
organic revenue growth.*
24%
15%
Organic revenue
growth
Leaders Followers
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Traditionally, Analytics Looked Like This
OLTP ERP CRM LOB
Data warehouse
Business intelligence Relational data
TBs-PBs scale
Schema defined before data load
Operational reporting and on demand
Large initial capex + $10K–$50K / TB / Year
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Isolated data silos
Hadoop
Cluster
SQL
Database
Data
Warehouse
Appliance
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lake is a new and increasingly
popular architecture to store and
analyze massive volumes and
heterogeneous types of data.
Enter Data Lake Architectures
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes on AWS
Unmatched durability and availability at exabyte scale
Comprehensive security, compliance, and audit capabilities
Object-level controls
Usage and cost analysis insight into your data
Most ways to bring data in
Twice as many partner integrations
Data lake
A m a z o n S 3
A m a z o n G l a c i e r
A W S G l u e
Machine Learning
Analytics
Internet of Things
Snowball
Snowmobile Kinesis
Data Firehose
Kinesis
Data Streams
Kinesis
Video Streams
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Designed for 11 9s
of durability
Designed for
99.99% availability
Durable Available High performance
§ Multipart upload
§ Range GET
§ Store as much as you need
§ Scale storage and compute
independently
§ No minimum usage
commitments
Scalable
§ Amazon Redshift / Spectrum
§ Amazon EMR
§ Amazon Athena
§ Amazon DynamoDB
Integrated
§ Simple REST API
§ AWS SDKs
§ Read-after-create consistency
§ Event notification
§ Lifecycle policies
Easy to use
Why Amazon S3 for the Data Lake?
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes on AWS
Data lake
A m a z o n S 3
A m a z o n G l a c i e r
A W S G l u e
A m a z o n S a g e M a k e r
A W S D e e p L e a r n i n g A M I s
A m a z o n R e k o g n i t i o n
A m a z o n L e x
A W S D e e p L e n s
A m a z o n C o m p r e h e n d
A m a z o n T r a n s l a t e
A m a z o n T r a n s c r i b e
A m a z o n P o l l y
Machine Learning Analytics Internet of Things (IoT)
A W S I o T C o r e
A W S G r e e n g r a s s
A W S I o T A n a l y t i c s
A m a z o n F r e e R T O S
A W S I o T 1 - C l i c k
A W S I o T B u t t o n
A W S I o T D e v i c e M a n a g e m e n t
A W S I o T D e v i c e D e f e n d e r
A m a z o n A t h e n a
A m a z o n E M R
A m a z o n R e d s h i f t
A m a z o n E l a s t i c s e a r c h S e r v i c e
A m a z o n K i n e s i s
A m a z o n Q u i c k S i g h t
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data warehouse
Business
intelligence
Data lake
100110000100101011100
101010111001010100001
011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
learning
DW
queries
Big data
processing
Interactive Real-time
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Storing is not enough. Data needs to be discoverable.
Dark data are the information
assets organizations collect,
process, and store during
regular business activities,
but generally fail to use for
other purposes (for example,
analytics, business relationships
and direct monetizing).
Gartner
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue: Data Catalog
Make data discoverable
Automatically discovers data and stores schema
Catalog makes data searchable and available for ETL
Catalog contains table and job definitions
Computes statistics to make queries efficient
Compliance
AWS Glue
Data Catalog
Discover data and
extract schema
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data preparation accounts for ~80% of the work.
Building training sets
Cleaning and organizing data
Collecting data sets
Mining data for patterns
Refining algorithms
Other
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
AWS Glue: ETL Service
Make ETL scripting and deployment easy
Automatically generates ETL code
Code is customizable with Python and Spark
Endpoints provided to edit, debug, & test code
Jobs are scheduled or event-based
Serverless
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Athena: Interactive Analysis
$ SQL
Query Instantly
Zero setup cost;
just point to
Amazon S3 and
start querying.
Pay per query
Pay only for queries run;
save 30–90% on per-
query costs through
compression.
Open
ANSI SQL interface,
JDBC/ODBC drivers, multiple
formats, compression types,
and complex joins and data
types.
Easy
Serverless: zero
infrastructure, zero
administration
Integrated with Amazon
QuickSight.
Interactive query service to analyze data in Amazon S3 using standard SQL
No infrastructure to set up or manage and no data to load
Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon Athena
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EMR: Big Data Processing
$
Latest versions
Updated with the latest
open source frameworks
within 30 days of release.
Low cost
Flexible billing with per-
second billing, EC2 spot,
reserved instances and
auto-scaling to reduce
costs 50–80%.
Use S3 storage
Process data directly in
the Amazon S3 data lake
securely with high
performance using the
EMRFS connector.
Easy
Launch fully managed
Hadoop & Spark in
minutes; no cluster setup,
node provisioning, &
cluster tuning.
Data Lake
10011000010010101110
01010101110010101000
00111100101100101
010001100001
Analytics and ML at scale
Nineteen open-source projects: Apache Hadoop, Spark, HBase, Presto, and more
Enterprise-grade security
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Amazon EMR
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Relational and non-relational data
TBs-EBs scale
Schema defined during analysis
Diverse analytical engines to gain insights
Designed for low-cost storage and analytics
OLTP ERP CRM LOB
Data warehouse
Business
intelligence
Data lake
100110000100101011100
101010111001010100001
011111011010
0011110010110010110
0100011000010
Devices Web Sensors Social
Catalog
Machine
learning
DW
queries
Big data
processing
Interactive Real-time
Theo Hultberg
director of technology
@iconara
volume variety
S3
ETL
S3
?
S3
S3ETL APIAthena
APISELECT "app", "account",
SUM("logged_in_users")
FROM "usage"
WHERE "date" BETWEEN
DATE '2018-05-01'
AND DATE '2018-05-16'
GROUP BY 1, 2
Athena
S3ETL APIAthena
S3 Glue
Aurora
APIETL
Redshift
Athena
thank you!
@iconara
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?Albert Hoitingh
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake ArchitectureDATAVERSITY
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata ManagementDATAVERSITY
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 

What's hot (20)

How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Data Modeling & Metadata Management
Data Modeling & Metadata ManagementData Modeling & Metadata Management
Data Modeling & Metadata Management
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 

Similar to Building a Data Lake on AWS

Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsAmazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAmazon Web Services
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summits
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsAmazon Web Services
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Amazon Web Services
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAmazon Web Services
 

Similar to Building a Data Lake on AWS (20)

Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Data Lake na área da saúde- AWS
Data Lake na área da saúde- AWSData Lake na área da saúde- AWS
Data Lake na área da saúde- AWS
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data ArchitectureGet to Know Your Customers - Build and Innovate with a Modern Data Architecture
Get to Know Your Customers - Build and Innovate with a Modern Data Architecture
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics PlatformsAutomate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
Automate Business Insights on AWS - Simple, Fast, and Secure Analytics Platforms
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
AWS Summit Singapore 2019 | Big Data Analytics Architectural Patterns and Bes...
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Data Lake on AWS

  • 1. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Johan Broman Manager, Solutions Architecture – AWS Nordics Theo Hultberg Director of Technology - Burt Big Data and Data Lakes Building Blocks
  • 2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Drives Better Decision Making *Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence Data lake leaders who were highly efficient in capturing a diversity of data and making it accessible to their organization in a timely fashion outperformed their peers by 9% in organic revenue growth.* 24% 15% Organic revenue growth Leaders Followers
  • 3. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Traditionally, Analytics Looked Like This OLTP ERP CRM LOB Data warehouse Business intelligence Relational data TBs-PBs scale Schema defined before data load Operational reporting and on demand Large initial capex + $10K–$50K / TB / Year
  • 4. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Isolated data silos Hadoop Cluster SQL Database Data Warehouse Appliance
  • 5. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lake is a new and increasingly popular architecture to store and analyze massive volumes and heterogeneous types of data. Enter Data Lake Architectures
  • 6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes on AWS Unmatched durability and availability at exabyte scale Comprehensive security, compliance, and audit capabilities Object-level controls Usage and cost analysis insight into your data Most ways to bring data in Twice as many partner integrations Data lake A m a z o n S 3 A m a z o n G l a c i e r A W S G l u e Machine Learning Analytics Internet of Things Snowball Snowmobile Kinesis Data Firehose Kinesis Data Streams Kinesis Video Streams
  • 7. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance § Multipart upload § Range GET § Store as much as you need § Scale storage and compute independently § No minimum usage commitments Scalable § Amazon Redshift / Spectrum § Amazon EMR § Amazon Athena § Amazon DynamoDB Integrated § Simple REST API § AWS SDKs § Read-after-create consistency § Event notification § Lifecycle policies Easy to use Why Amazon S3 for the Data Lake?
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes on AWS Data lake A m a z o n S 3 A m a z o n G l a c i e r A W S G l u e A m a z o n S a g e M a k e r A W S D e e p L e a r n i n g A M I s A m a z o n R e k o g n i t i o n A m a z o n L e x A W S D e e p L e n s A m a z o n C o m p r e h e n d A m a z o n T r a n s l a t e A m a z o n T r a n s c r i b e A m a z o n P o l l y Machine Learning Analytics Internet of Things (IoT) A W S I o T C o r e A W S G r e e n g r a s s A W S I o T A n a l y t i c s A m a z o n F r e e R T O S A W S I o T 1 - C l i c k A W S I o T B u t t o n A W S I o T D e v i c e M a n a g e m e n t A W S I o T D e v i c e D e f e n d e r A m a z o n A t h e n a A m a z o n E M R A m a z o n R e d s h i f t A m a z o n E l a s t i c s e a r c h S e r v i c e A m a z o n K i n e s i s A m a z o n Q u i c k S i g h t
  • 9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data warehouse Business intelligence Data lake 100110000100101011100 101010111001010100001 011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine learning DW queries Big data processing Interactive Real-time
  • 10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Storing is not enough. Data needs to be discoverable. Dark data are the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner CRM ERP Data warehouse Mainframe data Web Social Log files Machine data Semi- structured Unstructured “ ”
  • 11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue: Data Catalog Make data discoverable Automatically discovers data and stores schema Catalog makes data searchable and available for ETL Catalog contains table and job definitions Computes statistics to make queries efficient Compliance AWS Glue Data Catalog Discover data and extract schema
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data preparation accounts for ~80% of the work. Building training sets Cleaning and organizing data Collecting data sets Mining data for patterns Refining algorithms Other
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. AWS Glue: ETL Service Make ETL scripting and deployment easy Automatically generates ETL code Code is customizable with Python and Spark Endpoints provided to edit, debug, & test code Jobs are scheduled or event-based Serverless
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Athena: Interactive Analysis $ SQL Query Instantly Zero setup cost; just point to Amazon S3 and start querying. Pay per query Pay only for queries run; save 30–90% on per- query costs through compression. Open ANSI SQL interface, JDBC/ODBC drivers, multiple formats, compression types, and complex joins and data types. Easy Serverless: zero infrastructure, zero administration Integrated with Amazon QuickSight. Interactive query service to analyze data in Amazon S3 using standard SQL No infrastructure to set up or manage and no data to load Ability to run SQL queries on data archived in Amazon Glacier (coming soon)
  • 15. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon Athena
  • 16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon EMR: Big Data Processing $ Latest versions Updated with the latest open source frameworks within 30 days of release. Low cost Flexible billing with per- second billing, EC2 spot, reserved instances and auto-scaling to reduce costs 50–80%. Use S3 storage Process data directly in the Amazon S3 data lake securely with high performance using the EMRFS connector. Easy Launch fully managed Hadoop & Spark in minutes; no cluster setup, node provisioning, & cluster tuning. Data Lake 10011000010010101110 01010101110010101000 00111100101100101 010001100001 Analytics and ML at scale Nineteen open-source projects: Apache Hadoop, Spark, HBase, Presto, and more Enterprise-grade security
  • 17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Amazon EMR
  • 18. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Lakes Extend the Traditional Approach Relational and non-relational data TBs-EBs scale Schema defined during analysis Diverse analytical engines to gain insights Designed for low-cost storage and analytics OLTP ERP CRM LOB Data warehouse Business intelligence Data lake 100110000100101011100 101010111001010100001 011111011010 0011110010110010110 0100011000010 Devices Web Sensors Social Catalog Machine learning DW queries Big data processing Interactive Real-time
  • 19. Theo Hultberg director of technology @iconara
  • 20.
  • 23. S3
  • 24. ? S3
  • 25.
  • 27. APISELECT "app", "account", SUM("logged_in_users") FROM "usage" WHERE "date" BETWEEN DATE '2018-05-01' AND DATE '2018-05-16' GROUP BY 1, 2 Athena
  • 30.
  • 32. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Thank you!