SlideShare a Scribd company logo
1
www.matillion.com
© 2017 Matillion. All rights reserved.
Presented by:
Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or
service marks are property of their respective owners. 10/10/2017
Dive into Data Lakes
James Johnson and Paul Johnson
Wednesday, April 25th, 11am EDT
2
www.matillion.com
© 2017 Matillion. All rights reserved.
Who we are
• AWS Advanced Technology Partner and Big Data
Competency Holder
• Google Cloud Platform Partner
• Matillion Products
- Matillion ETL for Amazon Redshift
- Matillion ETL for Snowflake
- Matillion ETL for BigQuery
• Over 60 5* reviews on the AWS Marketplace
3
www.matillion.com
© 2017 Matillion. All rights reserved.
Agenda
• What is a Data Lake?
• Data Lakes vs. Data Warehouses
• Data Lake Best Practice & Reference Architecture
• Matillion Demo
- Amazon Redshift Spectrum (ELT using S3)
- BigQuery (ELT using Google Cloud Storage)
- Snowflake (ELT with JSON files)
• How to Engage
4
www.matillion.com
© 2017 Matillion. All rights reserved.
What is a Data Lake?
Data MartData Lake
5
www.matillion.com
© 2017 Matillion. All rights reserved.
Characteristics of a Data Lake
• Data is stored in native format
• Stored forever
• Flexible access
• Schema-on-Read
• Data Lake is not an S3/Cloud
Storage bucket
• It’s not a dumping zone
• Free-for-all access
6
www.matillion.com
© 2017 Matillion. All rights reserved.
Data Lake vs. Data Warehouse
Complimentary or Contradictory?
7
www.matillion.com
© 2017 Matillion. All rights reserved.
Data Lake
• Data is stored in native format
• Store forever
• Flexible access to raw data
• Schema-on-Read
• Seperate storage & compute
• Data requires transformation
• Expensive to store large volumes
• Transformed before loading (ETL)
• Schema-on-Write
• Tightly coupled storage & compute
Traditional On-Premise Data
Warehouse
8
www.matillion.com
© 2017 Matillion. All rights reserved.
ETL Pipeline
9
www.matillion.com
© 2017 Matillion. All rights reserved.
Data Lake
• Data is stored in native format
• Store forever
• Flexible access to raw data
• Schema-on-Read
• Seperate storage & compute
• Support for semi-structured data
• Store forever
• Transformed after loading (ELT)
• Schema-on-Write & Schema-on-Read
• Seperate storage & compute
Modern Cloud Data Warehouse
10
www.matillion.com
© 2017 Matillion. All rights reserved.
ELT Pipeline
11
www.matillion.com
© 2017 Matillion. All rights reserved.
Data
Sources
ERP
CRM
Logs
Devices
Ingest
Files
Streams
Records
Presentation
Layer
Layer
Data Scientist
Data Analysts
Business
Users
Applications
Transactions
Data Lake
Process
Automation
Landing Zone
Trusted Zone
Consumption Zone
Refined for specific use.
Aggregated,
Denormalized etc.
Sandbox
Exploratory Analytics
Governance Tools
Master Data
Reference Data
Immutable
log
Metadata
& Lineage
Data
Quality
Data
Catalog Security
12
www.matillion.com
© 2017 Matillion. All rights reserved.
Data Lake Architecture Best Practices
Decouple storage & compute
• Data -> Store -> Process -> Store -> Analyse -> Answers
• Allows huge volumes of data to be stored without paying for compute
• Allows replacement of processing technology (e.g. Spark)
Use the right tool for the job
• Data structure, latency, throughput, access patterns
Log-centric design patterns
• Immutable logs, materialized views
• Delete nothing.
• Rebuild your data at a point in time
Leverage serverless or managed services
• Scalable/elastic, available, reliable, secure, no/low admin
13
www.matillion.com
© 2017 Matillion. All rights reserved.
Demonstration
14
www.matillion.com
© 2017 Matillion. All rights reserved.
• Available only on AWS Marketplace/Cloud
Launcher
• 14 day free trial and test drives
• SA team to support PoCs, demos and training
• Customer reference videos on YouTube
How to Engage
15
www.matillion.com
© 2017 Matillion. All rights reserved.
16
www.matillion.com
© 2017 Matillion. All rights reserved.
Presented by:
Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks
or service marks are property of their respective owners. 7/23/2018
Thank You
James Johnson and Paul Johnson

More Related Content

What's hot

PSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL ServerPSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL Server
Mark Kromer
 
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model PlatformDataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax Academy
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Altis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data PlatformAltis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data Platform
Altis Consulting
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Michael Rainey
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020
Nathan Skousen
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
Snowflake Computing
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
Snowflake Computing
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your Data
Itai Yaffe
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
Harald Erb
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summits
 
Optimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxOptimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptx
IDERA Software
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Databricks
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake Practice
SamanthaSwain7
 

What's hot (19)

PSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL ServerPSSUG Nov 2012: Big Data with SQL Server
PSSUG Nov 2012: Big Data with SQL Server
 
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model PlatformDataStax: Datastax Enterprise - The Multi-Model Platform
DataStax: Datastax Enterprise - The Multi-Model Platform
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Altis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data PlatformAltis Webinar: Use Cases For The Modern Data Platform
Altis Webinar: Use Cases For The Modern Data Platform
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGateContinuous Data Replication into Cloud Storage with Oracle GoldenGate
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your Data
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
Optimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxOptimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptx
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake Practice
 

Similar to Dive Into Data Lakes

Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
Amazon Web Services
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
Amazon Web Services
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
Jeffrey T. Pollock
 
Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
Matillion
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
Ricky Setyawan
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Amazon Web Services
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Amazon Web Services
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Amazon Web Services
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
Jeffrey T. Pollock
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
javier ramirez
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 

Similar to Dive Into Data Lakes (20)

Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Lets Talk Google BigQuery
Lets Talk Google BigQueryLets Talk Google BigQuery
Lets Talk Google BigQuery
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Toronto AWS Summit
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 

More from Matillion

ELT is Better. Here's Why.
ELT is Better. Here's Why. ELT is Better. Here's Why.
ELT is Better. Here's Why.
Matillion
 
Pick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehousePick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data Warehouse
Matillion
 
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesUsing ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Matillion
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
Matillion
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Matillion
 
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumWebinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Matillion
 
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift SpectrumWebinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
 
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
 

More from Matillion (8)

ELT is Better. Here's Why.
ELT is Better. Here's Why. ELT is Better. Here's Why.
ELT is Better. Here's Why.
 
Pick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data WarehousePick a Winner: How to Choose a Data Warehouse
Pick a Winner: How to Choose a Data Warehouse
 
Using ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 MinutesUsing ELT to load 1 Billion Rows of Data in 15 Minutes
Using ELT to load 1 Billion Rows of Data in 15 Minutes
 
How to Choose a Data Warehouse
How to Choose a Data WarehouseHow to Choose a Data Warehouse
How to Choose a Data Warehouse
 
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
Using Google Cloud for Marketing Analytics: How the7stars, the UK’s largest i...
 
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift SpectrumWebinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
Webinar | Accessing Your Data Lake Assets from Amazon Redshift Spectrum
 
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift SpectrumWebinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
 
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
 

Recently uploaded

原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 

Recently uploaded (20)

原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 

Dive Into Data Lakes

  • 1. 1 www.matillion.com © 2017 Matillion. All rights reserved. Presented by: Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or service marks are property of their respective owners. 10/10/2017 Dive into Data Lakes James Johnson and Paul Johnson Wednesday, April 25th, 11am EDT
  • 2. 2 www.matillion.com © 2017 Matillion. All rights reserved. Who we are • AWS Advanced Technology Partner and Big Data Competency Holder • Google Cloud Platform Partner • Matillion Products - Matillion ETL for Amazon Redshift - Matillion ETL for Snowflake - Matillion ETL for BigQuery • Over 60 5* reviews on the AWS Marketplace
  • 3. 3 www.matillion.com © 2017 Matillion. All rights reserved. Agenda • What is a Data Lake? • Data Lakes vs. Data Warehouses • Data Lake Best Practice & Reference Architecture • Matillion Demo - Amazon Redshift Spectrum (ELT using S3) - BigQuery (ELT using Google Cloud Storage) - Snowflake (ELT with JSON files) • How to Engage
  • 4. 4 www.matillion.com © 2017 Matillion. All rights reserved. What is a Data Lake? Data MartData Lake
  • 5. 5 www.matillion.com © 2017 Matillion. All rights reserved. Characteristics of a Data Lake • Data is stored in native format • Stored forever • Flexible access • Schema-on-Read • Data Lake is not an S3/Cloud Storage bucket • It’s not a dumping zone • Free-for-all access
  • 6. 6 www.matillion.com © 2017 Matillion. All rights reserved. Data Lake vs. Data Warehouse Complimentary or Contradictory?
  • 7. 7 www.matillion.com © 2017 Matillion. All rights reserved. Data Lake • Data is stored in native format • Store forever • Flexible access to raw data • Schema-on-Read • Seperate storage & compute • Data requires transformation • Expensive to store large volumes • Transformed before loading (ETL) • Schema-on-Write • Tightly coupled storage & compute Traditional On-Premise Data Warehouse
  • 8. 8 www.matillion.com © 2017 Matillion. All rights reserved. ETL Pipeline
  • 9. 9 www.matillion.com © 2017 Matillion. All rights reserved. Data Lake • Data is stored in native format • Store forever • Flexible access to raw data • Schema-on-Read • Seperate storage & compute • Support for semi-structured data • Store forever • Transformed after loading (ELT) • Schema-on-Write & Schema-on-Read • Seperate storage & compute Modern Cloud Data Warehouse
  • 10. 10 www.matillion.com © 2017 Matillion. All rights reserved. ELT Pipeline
  • 11. 11 www.matillion.com © 2017 Matillion. All rights reserved. Data Sources ERP CRM Logs Devices Ingest Files Streams Records Presentation Layer Layer Data Scientist Data Analysts Business Users Applications Transactions Data Lake Process Automation Landing Zone Trusted Zone Consumption Zone Refined for specific use. Aggregated, Denormalized etc. Sandbox Exploratory Analytics Governance Tools Master Data Reference Data Immutable log Metadata & Lineage Data Quality Data Catalog Security
  • 12. 12 www.matillion.com © 2017 Matillion. All rights reserved. Data Lake Architecture Best Practices Decouple storage & compute • Data -> Store -> Process -> Store -> Analyse -> Answers • Allows huge volumes of data to be stored without paying for compute • Allows replacement of processing technology (e.g. Spark) Use the right tool for the job • Data structure, latency, throughput, access patterns Log-centric design patterns • Immutable logs, materialized views • Delete nothing. • Rebuild your data at a point in time Leverage serverless or managed services • Scalable/elastic, available, reliable, secure, no/low admin
  • 13. 13 www.matillion.com © 2017 Matillion. All rights reserved. Demonstration
  • 14. 14 www.matillion.com © 2017 Matillion. All rights reserved. • Available only on AWS Marketplace/Cloud Launcher • 14 day free trial and test drives • SA team to support PoCs, demos and training • Customer reference videos on YouTube How to Engage
  • 16. 16 www.matillion.com © 2017 Matillion. All rights reserved. Presented by: Copyright © 2017. All rights reserved. Matillion, trademarks, registered trademarks or service marks are property of their respective owners. 7/23/2018 Thank You James Johnson and Paul Johnson