SlideShare a Scribd company logo
Cloud Data Warehousing
What - Why - How & Compare
Rogier Werschkull, RogerData
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
The story begins: before…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪ The system was reaching its storage limit
▪ And extending it was very expensive
▪ All kinds of interesting performance challenges started appearing…
▪ Loading windows, projections…
▪ New use cases started to appear:
▪ “We need near real-time for LiveOps”
▪ Move to mobile gaming: potential unplannable growth
▪ The system became a bit unstable…
▪ Not well distributed data
▪ Out of memory / failing nodes
Until…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Not a Vertica problem!…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
“How to fix
this
and prepare for the
future?”
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪Upgrade Vertica
▪ Huge initial investment.
• But what to choose when your data growth is unpredictable?
• And what about the changing use cases?
▪Store less data
▪ Try selling that to the business…
▪ And what about the data growth being unpredictable?
▪ Switch technology.............................
Choices…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
You guessed it!
Streaming Ingest /
Message log
Data warehousing
Cloud
Pub/Sub
Streaming pipeline
Cloud
Dataflow
BigQuery
Analytics
Batch Ingest File Storage
Cloud
Storage
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
3. when you want your costs to scale linear with usage
2. when your current DWH is maxed out and difficult or
expensive to expand
▪ When you need potential endless storage and compute scaling (more
real-time being one possibility)
▪ When you require better workload separation
1. when you want a system that is simpler to setup, config,
maintain and operate
▪ Where a lot of DBA / maintenance work you are still doing now ‘comes
for free’
The takeaway: move your DWH to the
Cloud…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
1. When everything mentioned on the previous slide does not
apply!
2. When you have a lot of data (10+ Terrabyte), and it is not in
the cloud already (ingress)
3. When your data amounts are small (<1TB) or in the multiple
Petabyte range (effect on costs)
4. When you have ultra sensitive data and don’t trust the
measures taken by the cloud providers to prevent this ‘from
going wrong’
The takeaway: when maybe not?
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
So which of
these 14 to
pick from then?
(Forrester Wave
for Cloud Data warehouses
Q4-2018)
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
The takeaway: start with either…
OR
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
The takeaway: consider:
Azure SQL
data warehouse
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
But WHY
Data-
warehousing?
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
But we’ll just do
this by building a
BiG data lake,
right?
Photo credit: Lake Public Domain, http://www.writeups.org/star-trek-brent-spiner-data/
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Becoming truly
Data-driven
is HARD...
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Failure…
Photo credit: https://highfiveexports.wordpress.com/2010/06/25/3000-pieces-lego-mix-specialty-pieces-rare-pieces-bricks-blocks-
parts-more-ultimate-lot-of-lego-parts-pieces-lego-for-sale-lego-batman-lego-starwars-lego-technic-lego-minifigur/
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
80-90%
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
“Every single company I've worked at and talked to has the
same problem without a single exception so far:
poor data quality...
Either there's incomplete data, missing data,
duplicative data.”
Ruslan Belkin, former VP of Engineering @ Twitter and Salesforce
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
“if you can’t build a
data warehouse you
shouldn’t do AI”
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
I don’t believe in
Cloud
Data warehousing!
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
(as the answer to all your Data warehousing
woes)
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
“The primary purpose of a data warehouse is to
transform data from
an application state into an integrated corporate
state”
Bill Inmon, the father of datawarehousing
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
This is
what we
still
SHOULD
want to
build
Subject
Oriented
Integrated
Time Variant Non-Volatile
DWH
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Build a Data warehouse!
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
But
Different...
Photo credit: Public Domain
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
By splitting the work..
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
From
x(ETL)
To
EL+x(Tl)
So…
Subject
Oriented
Integrated
Time Variant Non-Volatile
DWH
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Time
Variant
&
Non Volatile
Subject
Oriented
&
Integrated
Modern
DWH
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
EL x(Tl)
‘Virtual by
design’?
• Focus on the
transformation logic
• Not on storing / updating /
deleting data structures
• Simplifies backfilling /
changing the DWH
@rwerschkull
nl.linkedin.com/in/rogierwerschkullPhoto credit: Public Domain
Data &
History
Tagging &
Search
Integrate
Data into
meaningful
and useful
stuff
Modern
DWH
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
This is what we did in BigQuery!
Adress basic
DQ issues
Adress complex
DQ issues
I do believe in
Cloud
Data warehouse
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
technology!
I do believe in
Cloud based
analytical
databases @rwerschkull
nl.linkedin.com/in/rogierwerschkull
What
Tool do we
choose?
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Photo credit: Public Domain
Forrester Wave
for Cloud Data
warehouses
Q4-2018
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪Build on the Google Dremel execution engine (open sourced
as Apache Drill)
▪Available since October 2011
▪Cloud native: Born in the Cloud
▪Key unique feature:
▪ The only full-on DWAAS: No nodes, no cpu, no ram, nothing to
configure
BigQuery:
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪Based in PostgreSQL 8.0.2, rebuild as cloud based MPP
▪Available since October 2012
▪Based on legacy cloud DWH
▪Key unique features:
▪ Most implementations
▪ Best SQL support
Redshift:
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪New kid on the block, started by ex Oracle employees
▪Cloud native: Born in the Cloud
▪Available since October 2014
▪Key unique features:
▪ The only cloud agnostic DWH: AWS, Azure and Google (eary
2020)
▪ No downtime auto scaling
▪ Metadata based data cloning (clone your production to DTA, no
extra storage!)
Snowflake
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪Based on SQL Server Parallel Data Warehouse (PDW)
▪Based on legacy cloud DWH
▪Available since 2015 (gen1) and may 2018 (gen2)
▪Key unique features:
▪ Getting stronger quickly since the gen2 release
▪ Vast supporting ecosystem of GUI’s and ETL tools
Azure SQL datwarehouse
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
So why then and what
is different?
Comparing the top 3
benefits…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Photo credit: Public Domain
3- Costs:
Low entry point
/ Pay-for-use
Photo by Joel Filipe on Unsplash
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Feature Redshift Azure SQL DWH BigQuery Snowflake
Fixed Start/Licence
Costs? NO NO NO NO
Separation storage
and -compute NO YES YES YES
Costs easy to
calculate?
Bit of work Bit of work YES Bit of work
Good predictability?
YES YES
NO (on demand)
YES (flatrate)
Depends on auto-
scaling
Storage Costs /1TB
month (USD) 374* 149
20/10
UNCOMPRESSED
23
CPU / Usage costs
Depends Depends
5 per TB read
UNCOMPRESSED
Depends
Comparing Cost features..
* For current gen dense storage ds2.8xlarge cluster running all year @rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪ Strong:
▪ All start at 0 costs, no fixed licence fee
▪ All employ a pay for use model
▪ Snowflake has the cheapest storage (even more with metadata-based cloning)
▪ Weak:
▪ Redshift doesn’t seperate storage and compute
▪ BigQuery DWAAS model makes charges cpu costs based on data queried: can get
out of control when you don’t set limits!
This applies to on demand only!
▪ BigQuery’s limited end user data caching can lead to a rise in costs, depending on
the usage pattern (solution in development)
This applies to on demand only!
Costs: Solution strong / weak points
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
• A cloud DWH will not always cheaper than on-prem!
• Costs change from CAPEX to OPEX
• Requires a different operating model
• Cost can be unpredictable, can be seen as a problem
• And remember the TCO!
Costs: don’t forget…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
2- (Almost) Infinite scaling
Photo by Joel Filipe on Unsplash
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Feature Redshift Azure SQL DWH BigQuery Snowflake
Type Based on legacy Based on legacy Cloud native Cloud native
Max Storage size
2PB
Gen1:240TB
Gen2: 240TB row,
Unlimited columnstore
Unlimited Unlimited
Storage resizing COMPLEX Doable N.A. N.A.
Dynamic Node-
Resizing
Doable Doable N.A. EASY
Concurrency Resizing
COMPLEX Doable
Default 50,
then contact google
EASY
No-downtime
Auto-Scaling
NO NO N.A YES
Hibernate Compute NO YES N.A YES
Data caching Hot data SSD cache +
exact query
Hot data SSD cache +
exact query
Only exact query
+ in development
Hot data SSD cache +
exact query
Comparing Scale and Speed features..
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪Strong:
▪ BigQuery and Snowflake have unlimited storage
▪ BigQuery on-demand is always very powerful with 2000 slots.
Scaling is not relevant here!
▪ Snowflake has the best cluster and concurrency scaling options
▪Weak:
▪ Redshift is complex to resize, scale and cannot hibernate
▪ BigQuery’s DWAAS nature and limited caching options almost
always incur 2-3 seconds of query startup time
Scaling: Strong / weak points
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
• This could be ‘Your last DWH migration’: choose wisely!
• The power of this technology is an enabler for the modern
data warehousing methodology: Virtualize!
• An Infinate scale also increases the risk for:
• Infinite costs
• An infinite data mess
So take your Data Management (even more!) seriously!
Scaling: don’t forget…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Fivetran benchmark 10 Sept 2018
• 99 TPC-DS Queries
• Run only once
• Calculated with system
being idle 82% of time
• Factor 10 difference in
size of cluster and
dataset
• No usage of:
• Partitioning
• Sort keys
• Clustering
SOURCE: https://fivetran.com/blog/warehouse-benchmark @rwerschkull
nl.linkedin.com/in/rogierwerschkull
SOURCE: https://fivetran.com/blog/warehouse-benchmark
Histogram of costs for 99 TPC-DS queries with geometric mean
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
SOURCE: https://fivetran.com/blog/warehouse-benchmark
Histogram of performance for 99 TPC-DS queries with geometric mean
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
1-Ease of deployment, development and
maintenance
Photo credit: Public Domain
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Feature Redshift Azure SQL DWH BigQuery Snowflake
Setup process COMPLEX AVERAGE N.A EASY
Managing data on cluster COMPLEX AVERAGE N.A EASY
Separation storage and -compute NO YES YES YES
Time travel
(auto-backup)
8 hours + configurable
8 hours + User-Defined
Restore Points
YES, 7 days, 2 after delete
YES
1 to 90 days + fail safe
Metadata only data cloning NO NO NO YES
SQL DDL support EXCELLENT GOOD LIMITED GOOD
SQL DML support EXCELLENT OK GOOD OK*
Stored procedure support GOOD GOOD NO GOOD
UDF support GOOD GOOD AVERAGE: not centrally GOOD
Materialized view support NO YES (preview) NO YES (limited)
PK/FK support as metadata NO NO as metadata
Quality GUI / SQL interface GOOD GOOD, but no web UI OK, web GOOD, web
JSON Parsing capabilities AVERAGE In preview OK GOOD
ETL dev / scheduling GOOD:
AWS GLUE,
Coding
GOOD,
Data Factory, SSIS, coding
GOOD,
Cloud data Fusion, Scheduled
query, Cloud composer
OK,
Coding or ext. ETL tool in AWS
/ Azure
Comparing deployment, development and maintenance
* Analytical functions are not fully mature yet:
https://medium.com/@jthandy/how-compatible-are-redshift-and-snowflakes-sql-syntaxes-c2103a43ae84c
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪All have integrated replication and backup
▪BigQuery has no config / maintenance work at all
▪Snowflake has just enough simple configurability
▪BigQuery and Snowflake support time travel
▪Snowflake has metadata based database cloning
▪Redshift has the best SQL support
▪SQL datawarehouse has the best supporting ecosystem of
GUI’s and ETL tools
Using: Strong points
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪Redshift and SQL datawarehouse require you to choose
distributions keys and create / update statistics
▪Redshift requires DBA work to reclaim space when deleting
data
▪BigQuery’s SQL DDL support is limited
▪BigQuery’s has no stored procedure, materialized views
and limited UDF support
▪No-one has proper primary or foreign key support!
Using: Weak points
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
• The data storing related work that remains: thinking about
your partitioning and clustering (sorting-data) strategy
• You still need to use a good datawarehousing
methodology!
• The basic skills and competences needed don’t
change!
Using: don’t forget…
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
So, what about..
Security?Photo: My own……
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Feature Redshift Azure SQL DWH BigQuery Snowflake
Data in EU YES YES YES YES
Encryption at rest Optional Optional YES, always YES, always
Customer managed Key YES YES YES YES
MFA YES YES YES YES
Row level security
NO YES
YES, authorized
views
YES, authorized
views
ISO 27001 (Information
Security Management)
YES YES YES YES
ISO 27017 (Cloud Security YES YES YES YES
ISO 27018 (Cloud Privacy) YES YES YES YES
SOC 1,2 & 3 YES YES YES YES
EU Model Contract Clause
(Data protection directive)
YES YES YES YES
Privacy Shield YES YES YES YES
Comparing privacy / security / regulatory
The takeaway: start with either…
OR
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
The takeaway: consider:
Azure SQL
data warehouse
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
Thank you!
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
About me...
▪ Rogier Werschkull
▪ Independent DWH-BI consulant since 9/2018:
RogerData
▪ Data architecture, data modeling, data engineering
▪ Blogger
▪ BI/DWH Training
▪ Graph enthusiast
▪ Contact details:
▪ nl.linkedin.com/in/rogierwerschkull
▪ rogier@rogerdata.nl
▪ cloudanalyticsnow.nl
▪ @rwerschkull
▪ PDF: Sonra - a comparison of cloud data warehouse platforms
▪ PDF: Snowflake - The Forrester Wave - Cloud DWH Q4 2018
▪ PDF: GigaOm-sector-roadmap-cloud-analytic-databases-2017
▪ gigaom.com/report/data-warehouse-in-the-cloud-benchmark/
▪ tech.marksblogg.com/benchmarks.html
▪ dzone.com/articles/choosing-between-modern-data-warehouses
▪ www.periscopedata.com/blog/interactive-analytics-redshift-bigquery-snowflake
▪ fivetran.com/blog/warehouse-benchmark
▪ medium.com/@jthandy/how-compatible-are-redshift-and-snowflakes-sql-
syntaxes-c2103a43ae84
Resources-1
@rwerschkull
nl.linkedin.com/in/rogierwerschkull
▪aws.amazon.com/redshift/faqs/
▪azure.microsoft.com/en-us/blog/azure-sql-data-warehouse-
releases-new-capabilities-for-performance-and-security/
▪cloud.google.com/security/compliance/#/regions=Europe
▪docs.microsoft.com/en-us/sql/tools/overview-sql-
tools?view=sql-server-2017
Resources-2
@rwerschkull
nl.linkedin.com/in/rogierwerschkull

More Related Content

What's hot

Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Slim Baltagi
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Amazon Web Services
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
Sivakumar Ramar
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra
 
Optimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxOptimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptx
IDERA Software
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
SamanthaBerlant
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
Data Science Thailand
 

What's hot (20)

Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Optimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptxOptimize the performance, cost, and value of databases.pptx
Optimize the performance, cost, and value of databases.pptx
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 

Similar to Cloud Data Warehousing presentation by Rogier Werschkull, including tips, best practices and a BigQuery, Redshift, Snowflake and Azure SQL DWH comparison, delivered at #BIDASUMMIT

Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Laure Vergeron
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Alluxio, Inc.
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
Knoldus Inc.
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
Slim Baltagi
 
Big Data - Big Pitfalls.
Big Data - Big Pitfalls.Big Data - Big Pitfalls.
Big Data - Big Pitfalls.
Roman Nikitchenko
 
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
GeeksLab Odessa
 
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
Binary Studio
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Web Services
 
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLSQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
BCS Data Management Specialist Group
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?
Greg Lindahl
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
Huy Do
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Andrei Savu
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
DataStax
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
jKool
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
Treasure Data, Inc.
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
Treasure Data, Inc.
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
Patrick Pierson
 

Similar to Cloud Data Warehousing presentation by Rogier Werschkull, including tips, best practices and a BigQuery, Redshift, Snowflake and Azure SQL DWH comparison, delivered at #BIDASUMMIT (20)

Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Big Data - Big Pitfalls.
Big Data - Big Pitfalls.Big Data - Big Pitfalls.
Big Data - Big Pitfalls.
 
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
 
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOLSQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
SQL vs NoSQL: Why you’ll never dump your relations - Dave Shuttleworth, EXASOL
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?
 
NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]NoSQL for great good [hanoi.rb talk]
NoSQL for great good [hanoi.rb talk]
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 

More from Patrick Van Renterghem

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Patrick Van Renterghem
 
Implementing error-proof, business-critical Machine Learning, presentation by...
Implementing error-proof, business-critical Machine Learning, presentation by...Implementing error-proof, business-critical Machine Learning, presentation by...
Implementing error-proof, business-critical Machine Learning, presentation by...
Patrick Van Renterghem
 
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Patrick Van Renterghem
 
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
Patrick Van Renterghem
 
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Patrick Van Renterghem
 
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Patrick Van Renterghem
 
How obedient digital twins and intelligent beings contribute to ethics and ex...
How obedient digital twins and intelligent beings contribute to ethics and ex...How obedient digital twins and intelligent beings contribute to ethics and ex...
How obedient digital twins and intelligent beings contribute to ethics and ex...
Patrick Van Renterghem
 
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
Patrick Van Renterghem
 
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Patrick Van Renterghem
 
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Patrick Van Renterghem
 
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Patrick Van Renterghem
 
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Patrick Van Renterghem
 
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
Patrick Van Renterghem
 
Engie's Digital Workplace and "Connecting the company" business case, present...
Engie's Digital Workplace and "Connecting the company" business case, present...Engie's Digital Workplace and "Connecting the company" business case, present...
Engie's Digital Workplace and "Connecting the company" business case, present...
Patrick Van Renterghem
 
Face your communication challenges when implementing a digital workplace, bas...
Face your communication challenges when implementing a digital workplace, bas...Face your communication challenges when implementing a digital workplace, bas...
Face your communication challenges when implementing a digital workplace, bas...
Patrick Van Renterghem
 
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
Patrick Van Renterghem
 
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Patrick Van Renterghem
 
Tim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentationTim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentation
Patrick Van Renterghem
 
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Patrick Van Renterghem
 
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Patrick Van Renterghem
 

More from Patrick Van Renterghem (20)

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
 
Implementing error-proof, business-critical Machine Learning, presentation by...
Implementing error-proof, business-critical Machine Learning, presentation by...Implementing error-proof, business-critical Machine Learning, presentation by...
Implementing error-proof, business-critical Machine Learning, presentation by...
 
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
 
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
 
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
 
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
 
How obedient digital twins and intelligent beings contribute to ethics and ex...
How obedient digital twins and intelligent beings contribute to ethics and ex...How obedient digital twins and intelligent beings contribute to ethics and ex...
How obedient digital twins and intelligent beings contribute to ethics and ex...
 
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
 
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
 
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
 
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
 
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
 
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
 
Engie's Digital Workplace and "Connecting the company" business case, present...
Engie's Digital Workplace and "Connecting the company" business case, present...Engie's Digital Workplace and "Connecting the company" business case, present...
Engie's Digital Workplace and "Connecting the company" business case, present...
 
Face your communication challenges when implementing a digital workplace, bas...
Face your communication challenges when implementing a digital workplace, bas...Face your communication challenges when implementing a digital workplace, bas...
Face your communication challenges when implementing a digital workplace, bas...
 
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
 
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
 
Tim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentationTim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentation
 
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
 
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
 

Recently uploaded

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Cloud Data Warehousing presentation by Rogier Werschkull, including tips, best practices and a BigQuery, Redshift, Snowflake and Azure SQL DWH comparison, delivered at #BIDASUMMIT

  • 1. Cloud Data Warehousing What - Why - How & Compare Rogier Werschkull, RogerData @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 2. The story begins: before… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 3. ▪ The system was reaching its storage limit ▪ And extending it was very expensive ▪ All kinds of interesting performance challenges started appearing… ▪ Loading windows, projections… ▪ New use cases started to appear: ▪ “We need near real-time for LiveOps” ▪ Move to mobile gaming: potential unplannable growth ▪ The system became a bit unstable… ▪ Not well distributed data ▪ Out of memory / failing nodes Until… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 4. Not a Vertica problem!… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 5. “How to fix this and prepare for the future?” @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 6. ▪Upgrade Vertica ▪ Huge initial investment. • But what to choose when your data growth is unpredictable? • And what about the changing use cases? ▪Store less data ▪ Try selling that to the business… ▪ And what about the data growth being unpredictable? ▪ Switch technology............................. Choices… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 7. You guessed it! Streaming Ingest / Message log Data warehousing Cloud Pub/Sub Streaming pipeline Cloud Dataflow BigQuery Analytics Batch Ingest File Storage Cloud Storage @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 8. 3. when you want your costs to scale linear with usage 2. when your current DWH is maxed out and difficult or expensive to expand ▪ When you need potential endless storage and compute scaling (more real-time being one possibility) ▪ When you require better workload separation 1. when you want a system that is simpler to setup, config, maintain and operate ▪ Where a lot of DBA / maintenance work you are still doing now ‘comes for free’ The takeaway: move your DWH to the Cloud… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 9. 1. When everything mentioned on the previous slide does not apply! 2. When you have a lot of data (10+ Terrabyte), and it is not in the cloud already (ingress) 3. When your data amounts are small (<1TB) or in the multiple Petabyte range (effect on costs) 4. When you have ultra sensitive data and don’t trust the measures taken by the cloud providers to prevent this ‘from going wrong’ The takeaway: when maybe not? @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 10. So which of these 14 to pick from then? (Forrester Wave for Cloud Data warehouses Q4-2018) @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 11. The takeaway: start with either… OR @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 12. The takeaway: consider: Azure SQL data warehouse @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 14. But we’ll just do this by building a BiG data lake, right? Photo credit: Lake Public Domain, http://www.writeups.org/star-trek-brent-spiner-data/ @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 18. “Every single company I've worked at and talked to has the same problem without a single exception so far: poor data quality... Either there's incomplete data, missing data, duplicative data.” Ruslan Belkin, former VP of Engineering @ Twitter and Salesforce @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 20. “if you can’t build a data warehouse you shouldn’t do AI” @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 22. I don’t believe in Cloud Data warehousing! @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 23. (as the answer to all your Data warehousing woes) @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 24. “The primary purpose of a data warehouse is to transform data from an application state into an integrated corporate state” Bill Inmon, the father of datawarehousing @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 25. This is what we still SHOULD want to build Subject Oriented Integrated Time Variant Non-Volatile DWH @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 26. Build a Data warehouse! @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 27. But Different... Photo credit: Public Domain @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 28. By splitting the work.. @rwerschkull nl.linkedin.com/in/rogierwerschkull From x(ETL) To EL+x(Tl)
  • 31. ‘Virtual by design’? • Focus on the transformation logic • Not on storing / updating / deleting data structures • Simplifies backfilling / changing the DWH @rwerschkull nl.linkedin.com/in/rogierwerschkullPhoto credit: Public Domain
  • 32. Data & History Tagging & Search Integrate Data into meaningful and useful stuff Modern DWH @rwerschkull nl.linkedin.com/in/rogierwerschkull This is what we did in BigQuery! Adress basic DQ issues Adress complex DQ issues
  • 33. I do believe in Cloud Data warehouse @rwerschkull nl.linkedin.com/in/rogierwerschkull technology!
  • 34. I do believe in Cloud based analytical databases @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 36. Forrester Wave for Cloud Data warehouses Q4-2018 @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 38. ▪Build on the Google Dremel execution engine (open sourced as Apache Drill) ▪Available since October 2011 ▪Cloud native: Born in the Cloud ▪Key unique feature: ▪ The only full-on DWAAS: No nodes, no cpu, no ram, nothing to configure BigQuery: @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 39. ▪Based in PostgreSQL 8.0.2, rebuild as cloud based MPP ▪Available since October 2012 ▪Based on legacy cloud DWH ▪Key unique features: ▪ Most implementations ▪ Best SQL support Redshift: @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 40. ▪New kid on the block, started by ex Oracle employees ▪Cloud native: Born in the Cloud ▪Available since October 2014 ▪Key unique features: ▪ The only cloud agnostic DWH: AWS, Azure and Google (eary 2020) ▪ No downtime auto scaling ▪ Metadata based data cloning (clone your production to DTA, no extra storage!) Snowflake @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 41. ▪Based on SQL Server Parallel Data Warehouse (PDW) ▪Based on legacy cloud DWH ▪Available since 2015 (gen1) and may 2018 (gen2) ▪Key unique features: ▪ Getting stronger quickly since the gen2 release ▪ Vast supporting ecosystem of GUI’s and ETL tools Azure SQL datwarehouse @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 42. So why then and what is different? Comparing the top 3 benefits… @rwerschkull nl.linkedin.com/in/rogierwerschkull Photo credit: Public Domain
  • 43. 3- Costs: Low entry point / Pay-for-use Photo by Joel Filipe on Unsplash @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 44. Feature Redshift Azure SQL DWH BigQuery Snowflake Fixed Start/Licence Costs? NO NO NO NO Separation storage and -compute NO YES YES YES Costs easy to calculate? Bit of work Bit of work YES Bit of work Good predictability? YES YES NO (on demand) YES (flatrate) Depends on auto- scaling Storage Costs /1TB month (USD) 374* 149 20/10 UNCOMPRESSED 23 CPU / Usage costs Depends Depends 5 per TB read UNCOMPRESSED Depends Comparing Cost features.. * For current gen dense storage ds2.8xlarge cluster running all year @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 45. ▪ Strong: ▪ All start at 0 costs, no fixed licence fee ▪ All employ a pay for use model ▪ Snowflake has the cheapest storage (even more with metadata-based cloning) ▪ Weak: ▪ Redshift doesn’t seperate storage and compute ▪ BigQuery DWAAS model makes charges cpu costs based on data queried: can get out of control when you don’t set limits! This applies to on demand only! ▪ BigQuery’s limited end user data caching can lead to a rise in costs, depending on the usage pattern (solution in development) This applies to on demand only! Costs: Solution strong / weak points @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 46. • A cloud DWH will not always cheaper than on-prem! • Costs change from CAPEX to OPEX • Requires a different operating model • Cost can be unpredictable, can be seen as a problem • And remember the TCO! Costs: don’t forget… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 47. 2- (Almost) Infinite scaling Photo by Joel Filipe on Unsplash @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 48. Feature Redshift Azure SQL DWH BigQuery Snowflake Type Based on legacy Based on legacy Cloud native Cloud native Max Storage size 2PB Gen1:240TB Gen2: 240TB row, Unlimited columnstore Unlimited Unlimited Storage resizing COMPLEX Doable N.A. N.A. Dynamic Node- Resizing Doable Doable N.A. EASY Concurrency Resizing COMPLEX Doable Default 50, then contact google EASY No-downtime Auto-Scaling NO NO N.A YES Hibernate Compute NO YES N.A YES Data caching Hot data SSD cache + exact query Hot data SSD cache + exact query Only exact query + in development Hot data SSD cache + exact query Comparing Scale and Speed features.. @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 49. ▪Strong: ▪ BigQuery and Snowflake have unlimited storage ▪ BigQuery on-demand is always very powerful with 2000 slots. Scaling is not relevant here! ▪ Snowflake has the best cluster and concurrency scaling options ▪Weak: ▪ Redshift is complex to resize, scale and cannot hibernate ▪ BigQuery’s DWAAS nature and limited caching options almost always incur 2-3 seconds of query startup time Scaling: Strong / weak points @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 50. • This could be ‘Your last DWH migration’: choose wisely! • The power of this technology is an enabler for the modern data warehousing methodology: Virtualize! • An Infinate scale also increases the risk for: • Infinite costs • An infinite data mess So take your Data Management (even more!) seriously! Scaling: don’t forget… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 51. Fivetran benchmark 10 Sept 2018 • 99 TPC-DS Queries • Run only once • Calculated with system being idle 82% of time • Factor 10 difference in size of cluster and dataset • No usage of: • Partitioning • Sort keys • Clustering SOURCE: https://fivetran.com/blog/warehouse-benchmark @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 52. SOURCE: https://fivetran.com/blog/warehouse-benchmark Histogram of costs for 99 TPC-DS queries with geometric mean @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 53. SOURCE: https://fivetran.com/blog/warehouse-benchmark Histogram of performance for 99 TPC-DS queries with geometric mean @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 54. 1-Ease of deployment, development and maintenance Photo credit: Public Domain @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 55. Feature Redshift Azure SQL DWH BigQuery Snowflake Setup process COMPLEX AVERAGE N.A EASY Managing data on cluster COMPLEX AVERAGE N.A EASY Separation storage and -compute NO YES YES YES Time travel (auto-backup) 8 hours + configurable 8 hours + User-Defined Restore Points YES, 7 days, 2 after delete YES 1 to 90 days + fail safe Metadata only data cloning NO NO NO YES SQL DDL support EXCELLENT GOOD LIMITED GOOD SQL DML support EXCELLENT OK GOOD OK* Stored procedure support GOOD GOOD NO GOOD UDF support GOOD GOOD AVERAGE: not centrally GOOD Materialized view support NO YES (preview) NO YES (limited) PK/FK support as metadata NO NO as metadata Quality GUI / SQL interface GOOD GOOD, but no web UI OK, web GOOD, web JSON Parsing capabilities AVERAGE In preview OK GOOD ETL dev / scheduling GOOD: AWS GLUE, Coding GOOD, Data Factory, SSIS, coding GOOD, Cloud data Fusion, Scheduled query, Cloud composer OK, Coding or ext. ETL tool in AWS / Azure Comparing deployment, development and maintenance * Analytical functions are not fully mature yet: https://medium.com/@jthandy/how-compatible-are-redshift-and-snowflakes-sql-syntaxes-c2103a43ae84c @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 56. ▪All have integrated replication and backup ▪BigQuery has no config / maintenance work at all ▪Snowflake has just enough simple configurability ▪BigQuery and Snowflake support time travel ▪Snowflake has metadata based database cloning ▪Redshift has the best SQL support ▪SQL datawarehouse has the best supporting ecosystem of GUI’s and ETL tools Using: Strong points @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 57. ▪Redshift and SQL datawarehouse require you to choose distributions keys and create / update statistics ▪Redshift requires DBA work to reclaim space when deleting data ▪BigQuery’s SQL DDL support is limited ▪BigQuery’s has no stored procedure, materialized views and limited UDF support ▪No-one has proper primary or foreign key support! Using: Weak points @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 58. • The data storing related work that remains: thinking about your partitioning and clustering (sorting-data) strategy • You still need to use a good datawarehousing methodology! • The basic skills and competences needed don’t change! Using: don’t forget… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 59. So, what about.. Security?Photo: My own…… @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 60. Feature Redshift Azure SQL DWH BigQuery Snowflake Data in EU YES YES YES YES Encryption at rest Optional Optional YES, always YES, always Customer managed Key YES YES YES YES MFA YES YES YES YES Row level security NO YES YES, authorized views YES, authorized views ISO 27001 (Information Security Management) YES YES YES YES ISO 27017 (Cloud Security YES YES YES YES ISO 27018 (Cloud Privacy) YES YES YES YES SOC 1,2 & 3 YES YES YES YES EU Model Contract Clause (Data protection directive) YES YES YES YES Privacy Shield YES YES YES YES Comparing privacy / security / regulatory
  • 61. The takeaway: start with either… OR @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 62. The takeaway: consider: Azure SQL data warehouse @rwerschkull nl.linkedin.com/in/rogierwerschkull
  • 64. About me... ▪ Rogier Werschkull ▪ Independent DWH-BI consulant since 9/2018: RogerData ▪ Data architecture, data modeling, data engineering ▪ Blogger ▪ BI/DWH Training ▪ Graph enthusiast ▪ Contact details: ▪ nl.linkedin.com/in/rogierwerschkull ▪ rogier@rogerdata.nl ▪ cloudanalyticsnow.nl ▪ @rwerschkull
  • 65. ▪ PDF: Sonra - a comparison of cloud data warehouse platforms ▪ PDF: Snowflake - The Forrester Wave - Cloud DWH Q4 2018 ▪ PDF: GigaOm-sector-roadmap-cloud-analytic-databases-2017 ▪ gigaom.com/report/data-warehouse-in-the-cloud-benchmark/ ▪ tech.marksblogg.com/benchmarks.html ▪ dzone.com/articles/choosing-between-modern-data-warehouses ▪ www.periscopedata.com/blog/interactive-analytics-redshift-bigquery-snowflake ▪ fivetran.com/blog/warehouse-benchmark ▪ medium.com/@jthandy/how-compatible-are-redshift-and-snowflakes-sql- syntaxes-c2103a43ae84 Resources-1 @rwerschkull nl.linkedin.com/in/rogierwerschkull