SlideShare a Scribd company logo
About Me
 Microsoft, Big Data Evangelist
 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
I tried to understand the modern data warehouse on my own…
And felt like I was body slammed by Randy
Savage:
Let’s prevent that from happening…
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Azure Data Lake
Storage Gen1
SQL Server 2019 Big
Data Cluster
Azure Databricks
Azure HDInsight
PolyBase & Stored
Procedures
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
SQL Database (Single, MI,
HyperScale, Serverless)
SQL Server in a VM
Cosmos DB
Power BI Aggregations
Questions to ask customer
• Can you use the cloud?
• Is this a new solution or a migration?
• What is the skillset of the developers?
• Will you use non-relational data (variety)?
• How much data do you need to store (volume)?
• Is this an OLTP or OLAP/DW solution?
• Will you have streaming data (velocity)?
• Will you use dashboards and/or ad-hoc queries?
• Will you use batch and/or interactive queries?
• How fast do the operational reports need to run?
• Will you do predictive analytics?
• Do you want to use Microsoft tools or open source?
• What are your high availability and/or disaster recovery requirements?
• Do you need to master the data (MDM)?
• Are there any security limitations with storing data in the cloud?
• Does this solution require 24/7 client access?
• How many concurrent users will be accessing the solution at peak-time and on average?
• What is the skill level of the end users?
• What is your budget and timeline?
• Is the source data cloud-born and/or on-prem born?
• How much daily data needs to be imported into the solution?
• What are your current pain points or obstacles (performance, scale, storage, concurrency, query times, etc)?
• Are you ok with using products that are in preview?
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Azure Data Lake
Storage Gen1
SQL Server 2019 Big
Data Cluster
Azure Databricks
Azure HDInsight
PolyBase & Stored
Procedures
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
SQL Database (Single, MI,
HyperScale, Serverless)
SQL Server in a VM
Cosmos DB
Power BI Aggregations
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Azure Data Lake
Storage Gen1
SQL Server 2019 Big
Data Cluster
Azure Databricks
Azure HDInsight
PolyBase & Stored
Procedures
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
SQL Database (Single, MI,
HyperScale, Serverless)
SQL Server in a VM
Cosmos DB
Power BI Aggregations
non-analytical use cases that only need object
storage rather than hierarchical storage
LRS
Multiple replicas across
a datacenter
Protect against disk,
node, rack failures
Write is ack’d when all
replicas are committed
Superior to dual-parity
RAID
11 9s of durability
SLA: 99.9%
GRS
Multiple replicas across each
of 2 regions
Protects against major
regional disasters
Asynchronous to secondary
16 9s of durability
SLA: 99.9%
RA-GRS
GRS + Read access to secondary
Separate secondary endpoint
RPO delay to secondary can be
queried
SLA: 99.99% (read), 99.9% (write)
Zone 1
ZRS
Replicas across 3 Zones
Protect against disk, node, rack and
zone failures
Synchronous writes to all 3 zones
12 9s of durability
Available in 8 regions
SLA: 99.9%
Zone 2 Zone 3
updateable
distributed tables and replicated dimensional tables). We now have HDFS on-prem version.
Both SQL and Spark can access same data. Great if you are already a SQL shop
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Azure Data Lake
Storage Gen1
SQL Server 2019 Big
Data Cluster
Azure Databricks
Azure HDInsight
PolyBase & Stored
Procedures
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
SQL Database (Single, MI,
HyperScale, Serverless)
SQL Server in a VM
Cosmos DB
Power BI Aggregations
Databricks is the preferred product over HDI, unless the customer has
a mature Hadoop ecosystem already established, wants to be 100% open source,
wants to use other Hadoop tools that are available 24/7 at a lower cost, or wants
to use other tools like Kafka/Storm/HBase/R Server/LLAP/Hive/Pig
always running and incurring costs
(no pausing or auto scale). Hortonworks merged with Cloudera
Stick with T-SQL and don’t want to deal with Spark or
Hive or other more-difficult technologies
Integrates data lake and data prep technology (Power Query)
directly into Power BI Service, independent of PBI reports. Self-service
data prep
Individual solution or for small workloads. Data Analysts
and Business Analysts. Can transform data that lands in the data lake
and can then be used as part of an enterprise solution
transforming large
amounts of data in a data lake or replacing long-running monthly batch
processing with shorter running distributed processes. Predictable
performance with no startup time
Does not support interactive
queries, persistence, or indexing
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Azure Data Lake
Storage Gen1
SQL Server 2019 Big
Data Cluster
Azure Databricks
Azure HDInsight
PolyBase & Stored
Procedures
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
SQL Database (Single, MI,
HyperScale, Serverless)
SQL Server in a VM
Cosmos DB
Power BI Aggregations
SQL-based, fully-managed, petabyte-scale cloud data warehouse.
Can scale compute and storage independently allowing you to burst
compute, and c
MPP technology that shines when used for ad-hoc queries and
operational reports in relational format
equires data to be copied from
ADLS into SQL DW but this can be done quickly using PolyBase
slower performance for ad-hoc queries
Area a
cases: Need control over / access to the operating system, have to run
the app or agents side-by-side with the DB, need to use older version of SQL
Server, SSRS, DW in the 4TB-50TB range, 3rd-party app not certified for PaaS,
DBA afraid of losing his job, control over backups and maintenance window,
want to avoid risk
How to use: IaaS. Provision
A globally distributed, multi-model (key-value, graph, and
document) database service. It fits into the NoSQL camp by having a non-
relational model (supporting schema-on-read and JSON documents)
Works really well for large-scale OLTP solutions.
for DW aggregations. Use for data lake to have one datastore
for both operational and analytical queries
Artificial Intelligence Decision Tree
Big Data Decision Tree v4
Business Intelligence Solutions Decision Tree
Microsoft data platform solutions
Product Category Description More Info
SQL Server 2017 RDBMS Earned top spot in Gartner’s Operational Database magic
quadrant. JSON support. Linux support
https://www.microsoft.com/en-us/server-
cloud/products/sql-server-2017/
SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly.
Has built-in high availability and disaster recovery. JSON
support. Managed Instance option
https://azure.microsoft.com/en-
us/services/sql-database/
SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data.
Provision and scale quickly. Can pause service to reduce
cost
https://azure.microsoft.com/en-
us/services/sql-data-warehouse/
Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of
your data while making it faster to get up and running with
batch, streaming, and interactive analytics
https://azure.microsoft.com/en-
us/services/data-lake-store/
HDInsight PaaS Hadoop
compute/Hadoop
clusters-as-a-service
A managed Apache Hadoop, Spark, R Server, HBase, Kafka,
Interactive Query (Hive LLAP) and Storm cloud service
made easy
https://azure.microsoft.com/en-
us/services/hdinsight/
Azure Databricks PaaS Spark clusters A fast, easy, and collaborative Apache Spark based analytics
platform optimized for Azure
https://databricks.com/azure
Azure Data Lake Analytics On-demand analytics job
service/Big Data-as-a-
service
Cloud-based service that dynamically provisions resources
so you can run queries on exabytes of data. Includes U-
SQL, a new big data query language
https://azure.microsoft.com/en-
us/services/data-lake-analytics/
Azure Cosmos DB PaaS NoSQL: Key-value,
Column-family,
Document, Graph
Globally distributed, massively scalable, multi-model, multi-
API, low latency data service – which can be used as an
operational database or a hot data lake
https://azure.microsoft.com/en-
us/services/cosmos-db/
Azure Database for PostgreSQL,
MySQL, and MariaDB
RDBMS/DBaaS A fully managed database service for app developers https://azure.microsoft.com/en-
us/services/postgresql
A “no-compromises” Data Lake: secure, performant, massively-scalable Data Lake storage that brings the cost and
scale profile of object storage together with the performance and analytics feature set of data lake storage
A z u r e D a t a L a k e S t o r a g e G e n 2
M A N A G E A B L E S C A L A B L EF A S TS E C U R E
 No limits on
data store size
 Global footprint
(50 regions)
 Optimized for Spark
and Hadoop
Analytic Engines
 Tightly integrated
with Azure end to
end analytics
solutions
 Automated
Lifecycle Policy
Management
 Object Level
tiering
 Support for fine-
grained ACLs,
protecting data at the
file and folder level
 Multi-layered
protection via at-rest
Storage Service
encryption and Azure
Active Directory
integration
C O S T
E F F E C T I V E
I N T E G R AT I O N
R E A D Y
 Atomic file
operations
means jobs
complete faster
 Object store
pricing levels
 File system
operations
minimize
transactions
required for job
completion
Managed data lake with
SQL Server and Spark
SQL
Server
Data virtualization
T-SQL
Analytics Apps
Open
database
connectivity
NoSQL Relational
databases
HDFS
Complete AI platform
SQL Server External Tables
Compute pools and data pools
Spark
Scalable, shared storage (HDFS)
External
data sources
Admin portal and management services
Integrated AD-based security
SQL Server
ML Services
Spark &
Spark ML
HDFS
REST API containers
for models
Managing all dataIntegrating all data AI over all data
Store high volume data in a data lake and access
it easily using either SQL or Spark
Management services, admin portal, and
integrated security make it all easy to manage
Combine data from many sources without
moving or replicating it
Scale out compute and caching to boost
performance
Easily feed integrated data from many sources to
your model training
Ingest and prep data and then train, store, and
operationalize your models all in one system
Intelligence over all data
Increase analytics and apps performance
Compute pool
SQL Compute
Node
SQL Compute
Node
SQL Compute
Node
…
Compute pool
SQL Compute
Node
IoT data
Directly
read from
HDFS
Persistent storage
…
Storage pool
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
SQL
Server
Spark
HDFS Data Node
Kubernetes pod
Analytics
Custom
apps BI
SQL Server
master instance
Node Node Node Node Node Node Node
SQL
Data pool
SQL Data
Node
SQL Data
Node
Compute pool
SQL Compute
Node
Storage Storage
Intelligence over all data
Programming
Model
General
Purpose
Business
Critical
Hyperscale Elastic Pools
Instance (MI) GA, 8TB GA, 4TB Private Preview,
100TB
April private
preview
Database
(Single)
GA, 4TB
[Serverless]
GA, 4TB Public Preview,
100TB
GA
Contact Lead Opportunity AccountContact Lead Opportunity Account Product ProfileProduct Profile People ProfileCustomer ProfileCustomer Profile
Power BI Azure
Databricks
Azure
Data
Factory
Azure
SQL DW
Self-service data prep
Dataflows
AI consumption
Enterprise BI
Semantic models
Self-service BI
Data ingestion
& orchestration
Enterprise
data prep
Curated data
INGEST STORE PREP & TRAIN MODEL & SERVE
C L O U D D A T A W A R E H O U S E
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
Media (unstructured)
Files (unstructured)
PolyBase
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
INGEST STORE PREP & TRAIN MODEL & SERVE
M O D E R N D A T A W A R E H O U S E
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory
Azure Databricks
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs.
Media (unstructured)
Files (unstructured)
PolyBase
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
A D V A N C E D A N A L Y T I C S O N B I G D A T A
INGEST STORE PREP & TRAIN MODEL & SERVE
Cosmos DB
Business/custom apps
(structured)
Files (unstructured)
Media (unstructured)
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
PolyBase
SparkR
Azure Databricks
Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet
their unique needs.
Real-time apps
INGEST STORE PREP & TRAIN MODEL & SERVE
R E A L T I M E A N A L Y T I C S
Sensors and IoT
(unstructured)
Apache Kafka for
HDInsight
Cosmos DB
Files (unstructured)
Media (unstructured)
Logs (unstructured)
Azure Data Lake Store Gen2Azure Data Factory
Azure Databricks
Real-time apps
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Azure Analysis
Services
Power BI
Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to
tailor the above architecture to meet their unique needs.
PolyBase
INGEST STORE MODEL & SERVE
D A T A M A R T C O N S O L I D A T I O N
Azure Data Lake Store Gen2 Azure SQL
Data Warehouse
Azure Data Factory Azure Analysis
Services
Power BI
RDBMS data marts
Hadoop
Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
PolyBase
INGEST STORE PREP & TRAIN MODEL & SERVE
H U B & S P O K E A R C H I T E C T U R E F O R B I
Azure SQL
Data Warehouse
PolyBase
Business/custom apps
(structured)
Power BI
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
Multiple Azure Analysis
Services instances
SQL
Multiple Azure SQL
Database instances
Data Marts
Data Cubes
Azure Databricks
Logs (unstructured)
Media (unstructured)
Files (unstructured)
Azure Data Lake Store Gen2Azure Data Factory
INGEST STORE PREP & TRAIN MODEL & SERVE
A U T O S C A L I N G D A T A W A R E H O U S E
Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution.
Azure Analysis
Services
Azure Functions
(Auto-scaling)
Business/custom apps
(structured)
Logs (unstructured)
Media (unstructured)
Files (unstructured)
Azure SQL
Data Warehouse
PolyBase
Power BIAzure Data Lake Store Gen2Azure Data Factory
Azure Databricks
D A T A W A R E H O U S E M I G R A T I O N
INGEST STORE PREP & TRAIN MODEL & SERVE
Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs.
Business/custom apps
(structured)
Azure SQL Data
Warehouse
Business/custom apps
Azure Data Lake Store Gen2
Logs (unstructured)
Azure Data Factory Azure Databricks
Media (unstructured)
Files (unstructured)
Azure Analysis
Services
Power BI
PolyBase
Building a modern data warehouse
Building a modern data warehouse

More Related Content

What's hot

Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
Harald Erb
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
David Walker
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
chennakesava44
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 

What's hot (20)

Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Wallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation RoadmapWallchart - Data Warehouse Documentation Roadmap
Wallchart - Data Warehouse Documentation Roadmap
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 

Similar to Building a modern data warehouse

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
James Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
James Serra
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Carole Gunst
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Itay Braun
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
Demai Ni
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
Martin Bém
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Keynote: Open Source für den geschäftskritischen Einsatz
Keynote: Open Source für den geschäftskritischen EinsatzKeynote: Open Source für den geschäftskritischen Einsatz
Keynote: Open Source für den geschäftskritischen Einsatz
MariaDB plc
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
Venu Anuganti
 

Similar to Building a modern data warehouse (20)

Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Keynote: Open Source für den geschäftskritischen Einsatz
Keynote: Open Source für den geschäftskritischen EinsatzKeynote: Open Source für den geschäftskritischen Einsatz
Keynote: Open Source für den geschäftskritischen Einsatz
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 

More from James Serra

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
James Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
James Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
James Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra
 
How to build your career
How to build your careerHow to build your career
How to build your career
James Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
James Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
James Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
James Serra
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
James Serra
 

More from James Serra (20)

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Building a modern data warehouse

  • 1.
  • 2. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 3. I tried to understand the modern data warehouse on my own… And felt like I was body slammed by Randy Savage: Let’s prevent that from happening…
  • 4. Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Azure Data Lake Storage Gen1 SQL Server 2019 Big Data Cluster Azure Databricks Azure HDInsight PolyBase & Stored Procedures Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services SQL Database (Single, MI, HyperScale, Serverless) SQL Server in a VM Cosmos DB Power BI Aggregations
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Questions to ask customer • Can you use the cloud? • Is this a new solution or a migration? • What is the skillset of the developers? • Will you use non-relational data (variety)? • How much data do you need to store (volume)? • Is this an OLTP or OLAP/DW solution? • Will you have streaming data (velocity)? • Will you use dashboards and/or ad-hoc queries? • Will you use batch and/or interactive queries? • How fast do the operational reports need to run? • Will you do predictive analytics? • Do you want to use Microsoft tools or open source? • What are your high availability and/or disaster recovery requirements? • Do you need to master the data (MDM)? • Are there any security limitations with storing data in the cloud? • Does this solution require 24/7 client access? • How many concurrent users will be accessing the solution at peak-time and on average? • What is the skill level of the end users? • What is your budget and timeline? • Is the source data cloud-born and/or on-prem born? • How much daily data needs to be imported into the solution? • What are your current pain points or obstacles (performance, scale, storage, concurrency, query times, etc)? • Are you ok with using products that are in preview?
  • 10.
  • 11. Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Azure Data Lake Storage Gen1 SQL Server 2019 Big Data Cluster Azure Databricks Azure HDInsight PolyBase & Stored Procedures Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services SQL Database (Single, MI, HyperScale, Serverless) SQL Server in a VM Cosmos DB Power BI Aggregations
  • 12.
  • 13.
  • 14.
  • 15. Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Azure Data Lake Storage Gen1 SQL Server 2019 Big Data Cluster Azure Databricks Azure HDInsight PolyBase & Stored Procedures Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services SQL Database (Single, MI, HyperScale, Serverless) SQL Server in a VM Cosmos DB Power BI Aggregations
  • 16.
  • 17. non-analytical use cases that only need object storage rather than hierarchical storage
  • 18. LRS Multiple replicas across a datacenter Protect against disk, node, rack failures Write is ack’d when all replicas are committed Superior to dual-parity RAID 11 9s of durability SLA: 99.9% GRS Multiple replicas across each of 2 regions Protects against major regional disasters Asynchronous to secondary 16 9s of durability SLA: 99.9% RA-GRS GRS + Read access to secondary Separate secondary endpoint RPO delay to secondary can be queried SLA: 99.99% (read), 99.9% (write) Zone 1 ZRS Replicas across 3 Zones Protect against disk, node, rack and zone failures Synchronous writes to all 3 zones 12 9s of durability Available in 8 regions SLA: 99.9% Zone 2 Zone 3
  • 19.
  • 20. updateable distributed tables and replicated dimensional tables). We now have HDFS on-prem version. Both SQL and Spark can access same data. Great if you are already a SQL shop
  • 21.
  • 22. Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Azure Data Lake Storage Gen1 SQL Server 2019 Big Data Cluster Azure Databricks Azure HDInsight PolyBase & Stored Procedures Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services SQL Database (Single, MI, HyperScale, Serverless) SQL Server in a VM Cosmos DB Power BI Aggregations
  • 23.
  • 24. Databricks is the preferred product over HDI, unless the customer has a mature Hadoop ecosystem already established, wants to be 100% open source, wants to use other Hadoop tools that are available 24/7 at a lower cost, or wants to use other tools like Kafka/Storm/HBase/R Server/LLAP/Hive/Pig always running and incurring costs (no pausing or auto scale). Hortonworks merged with Cloudera
  • 25. Stick with T-SQL and don’t want to deal with Spark or Hive or other more-difficult technologies
  • 26. Integrates data lake and data prep technology (Power Query) directly into Power BI Service, independent of PBI reports. Self-service data prep Individual solution or for small workloads. Data Analysts and Business Analysts. Can transform data that lands in the data lake and can then be used as part of an enterprise solution
  • 27. transforming large amounts of data in a data lake or replacing long-running monthly batch processing with shorter running distributed processes. Predictable performance with no startup time Does not support interactive queries, persistence, or indexing
  • 28.
  • 29. Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Azure Data Lake Storage Gen1 SQL Server 2019 Big Data Cluster Azure Databricks Azure HDInsight PolyBase & Stored Procedures Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services SQL Database (Single, MI, HyperScale, Serverless) SQL Server in a VM Cosmos DB Power BI Aggregations
  • 30. SQL-based, fully-managed, petabyte-scale cloud data warehouse. Can scale compute and storage independently allowing you to burst compute, and c MPP technology that shines when used for ad-hoc queries and operational reports in relational format equires data to be copied from ADLS into SQL DW but this can be done quickly using PolyBase
  • 31. slower performance for ad-hoc queries Area a
  • 32.
  • 33. cases: Need control over / access to the operating system, have to run the app or agents side-by-side with the DB, need to use older version of SQL Server, SSRS, DW in the 4TB-50TB range, 3rd-party app not certified for PaaS, DBA afraid of losing his job, control over backups and maintenance window, want to avoid risk How to use: IaaS. Provision
  • 34. A globally distributed, multi-model (key-value, graph, and document) database service. It fits into the NoSQL camp by having a non- relational model (supporting schema-on-read and JSON documents) Works really well for large-scale OLTP solutions. for DW aggregations. Use for data lake to have one datastore for both operational and analytical queries
  • 35.
  • 36.
  • 37. Artificial Intelligence Decision Tree Big Data Decision Tree v4 Business Intelligence Solutions Decision Tree
  • 38.
  • 39.
  • 40. Microsoft data platform solutions Product Category Description More Info SQL Server 2017 RDBMS Earned top spot in Gartner’s Operational Database magic quadrant. JSON support. Linux support https://www.microsoft.com/en-us/server- cloud/products/sql-server-2017/ SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly. Has built-in high availability and disaster recovery. JSON support. Managed Instance option https://azure.microsoft.com/en- us/services/sql-database/ SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data. Provision and scale quickly. Can pause service to reduce cost https://azure.microsoft.com/en- us/services/sql-data-warehouse/ Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics https://azure.microsoft.com/en- us/services/data-lake-store/ HDInsight PaaS Hadoop compute/Hadoop clusters-as-a-service A managed Apache Hadoop, Spark, R Server, HBase, Kafka, Interactive Query (Hive LLAP) and Storm cloud service made easy https://azure.microsoft.com/en- us/services/hdinsight/ Azure Databricks PaaS Spark clusters A fast, easy, and collaborative Apache Spark based analytics platform optimized for Azure https://databricks.com/azure Azure Data Lake Analytics On-demand analytics job service/Big Data-as-a- service Cloud-based service that dynamically provisions resources so you can run queries on exabytes of data. Includes U- SQL, a new big data query language https://azure.microsoft.com/en- us/services/data-lake-analytics/ Azure Cosmos DB PaaS NoSQL: Key-value, Column-family, Document, Graph Globally distributed, massively scalable, multi-model, multi- API, low latency data service – which can be used as an operational database or a hot data lake https://azure.microsoft.com/en- us/services/cosmos-db/ Azure Database for PostgreSQL, MySQL, and MariaDB RDBMS/DBaaS A fully managed database service for app developers https://azure.microsoft.com/en- us/services/postgresql
  • 41. A “no-compromises” Data Lake: secure, performant, massively-scalable Data Lake storage that brings the cost and scale profile of object storage together with the performance and analytics feature set of data lake storage A z u r e D a t a L a k e S t o r a g e G e n 2 M A N A G E A B L E S C A L A B L EF A S TS E C U R E  No limits on data store size  Global footprint (50 regions)  Optimized for Spark and Hadoop Analytic Engines  Tightly integrated with Azure end to end analytics solutions  Automated Lifecycle Policy Management  Object Level tiering  Support for fine- grained ACLs, protecting data at the file and folder level  Multi-layered protection via at-rest Storage Service encryption and Azure Active Directory integration C O S T E F F E C T I V E I N T E G R AT I O N R E A D Y  Atomic file operations means jobs complete faster  Object store pricing levels  File system operations minimize transactions required for job completion
  • 42. Managed data lake with SQL Server and Spark SQL Server Data virtualization T-SQL Analytics Apps Open database connectivity NoSQL Relational databases HDFS Complete AI platform SQL Server External Tables Compute pools and data pools Spark Scalable, shared storage (HDFS) External data sources Admin portal and management services Integrated AD-based security SQL Server ML Services Spark & Spark ML HDFS REST API containers for models Managing all dataIntegrating all data AI over all data Store high volume data in a data lake and access it easily using either SQL or Spark Management services, admin portal, and integrated security make it all easy to manage Combine data from many sources without moving or replicating it Scale out compute and caching to boost performance Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system Intelligence over all data
  • 43. Increase analytics and apps performance Compute pool SQL Compute Node SQL Compute Node SQL Compute Node … Compute pool SQL Compute Node IoT data Directly read from HDFS Persistent storage … Storage pool SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node SQL Server Spark HDFS Data Node Kubernetes pod Analytics Custom apps BI SQL Server master instance Node Node Node Node Node Node Node SQL Data pool SQL Data Node SQL Data Node Compute pool SQL Compute Node Storage Storage Intelligence over all data
  • 44.
  • 45. Programming Model General Purpose Business Critical Hyperscale Elastic Pools Instance (MI) GA, 8TB GA, 4TB Private Preview, 100TB April private preview Database (Single) GA, 4TB [Serverless] GA, 4TB Public Preview, 100TB GA
  • 46. Contact Lead Opportunity AccountContact Lead Opportunity Account Product ProfileProduct Profile People ProfileCustomer ProfileCustomer Profile Power BI Azure Databricks Azure Data Factory Azure SQL DW Self-service data prep Dataflows AI consumption Enterprise BI Semantic models Self-service BI Data ingestion & orchestration Enterprise data prep Curated data
  • 47.
  • 48. INGEST STORE PREP & TRAIN MODEL & SERVE C L O U D D A T A W A R E H O U S E Azure Data Lake Store Gen2 Logs (unstructured) Azure Data Factory Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs. Media (unstructured) Files (unstructured) PolyBase Business/custom apps (structured) Azure SQL Data Warehouse Azure Analysis Services Power BI
  • 49. INGEST STORE PREP & TRAIN MODEL & SERVE M O D E R N D A T A W A R E H O U S E Azure Data Lake Store Gen2 Logs (unstructured) Azure Data Factory Azure Databricks Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs. Media (unstructured) Files (unstructured) PolyBase Business/custom apps (structured) Azure SQL Data Warehouse Azure Analysis Services Power BI
  • 50. A D V A N C E D A N A L Y T I C S O N B I G D A T A INGEST STORE PREP & TRAIN MODEL & SERVE Cosmos DB Business/custom apps (structured) Files (unstructured) Media (unstructured) Logs (unstructured) Azure Data Lake Store Gen2Azure Data Factory Azure SQL Data Warehouse Azure Analysis Services Power BI PolyBase SparkR Azure Databricks Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet their unique needs. Real-time apps
  • 51. INGEST STORE PREP & TRAIN MODEL & SERVE R E A L T I M E A N A L Y T I C S Sensors and IoT (unstructured) Apache Kafka for HDInsight Cosmos DB Files (unstructured) Media (unstructured) Logs (unstructured) Azure Data Lake Store Gen2Azure Data Factory Azure Databricks Real-time apps Business/custom apps (structured) Azure SQL Data Warehouse Azure Analysis Services Power BI Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to tailor the above architecture to meet their unique needs. PolyBase
  • 52. INGEST STORE MODEL & SERVE D A T A M A R T C O N S O L I D A T I O N Azure Data Lake Store Gen2 Azure SQL Data Warehouse Azure Data Factory Azure Analysis Services Power BI RDBMS data marts Hadoop Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs. PolyBase
  • 53. INGEST STORE PREP & TRAIN MODEL & SERVE H U B & S P O K E A R C H I T E C T U R E F O R B I Azure SQL Data Warehouse PolyBase Business/custom apps (structured) Power BI Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution. Multiple Azure Analysis Services instances SQL Multiple Azure SQL Database instances Data Marts Data Cubes Azure Databricks Logs (unstructured) Media (unstructured) Files (unstructured) Azure Data Lake Store Gen2Azure Data Factory
  • 54. INGEST STORE PREP & TRAIN MODEL & SERVE A U T O S C A L I N G D A T A W A R E H O U S E Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution. Azure Analysis Services Azure Functions (Auto-scaling) Business/custom apps (structured) Logs (unstructured) Media (unstructured) Files (unstructured) Azure SQL Data Warehouse PolyBase Power BIAzure Data Lake Store Gen2Azure Data Factory Azure Databricks
  • 55. D A T A W A R E H O U S E M I G R A T I O N INGEST STORE PREP & TRAIN MODEL & SERVE Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs. Business/custom apps (structured) Azure SQL Data Warehouse Business/custom apps Azure Data Lake Store Gen2 Logs (unstructured) Azure Data Factory Azure Databricks Media (unstructured) Files (unstructured) Azure Analysis Services Power BI PolyBase

Editor's Notes

  1. 4
  2. https://blogs.technet.microsoft.com/msuspartner/2017/04/05/data-analytics-partners-navigating-data/
  3. https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  4. 11
  5. 3/4/2019
  6. 3/4/2019
  7. https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  8. 15
  9. 3/4/2019
  10. 3/4/2019
  11. https://azure.microsoft.com/en-us/support/legal/sla/storage/v1_3/
  12. 3/4/2019
  13. 3/4/2019
  14. https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  15. 22
  16. 3/4/2019
  17. 3/4/2019
  18. 3/4/2019
  19. 3/4/2019
  20. 3/4/2019
  21. https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  22. 29
  23. 3/4/2019
  24. 3/4/2019
  25. 3/4/2019
  26. 3/4/2019
  27. 3/4/2019
  28. 3/4/2019
  29. 3/4/2019
  30. 3/4/2019
  31. https://azure.microsoft.com/en-us/blog/json-functionalities-in-azure-sql-database-public-preview/ “If you need a specialized JSON database in order to take advantage of automatic indexing of JSON fields, tunable consistency levels for globally distributed data, and JavaScript integration, you may want to choose Azure DocumentDB as a storage engine.” https://blogs.msdn.microsoft.com/jocapc/2015/05/16/json-support-in-sql-server-2016/ https://msdn.microsoft.com/en-us/library/dn921897.aspx “If you have pure JSON workloads where you want to use some query language that is customized and dedicated for processing of JSON documents, you might consider Microsoft Azure DocumentDB.” http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016 https://www.simple-talk.com/sql/learn-sql-server/json-support-in-sql-server-2016/
  32. Integrating all data Combine data from many sources without moving or replicating it – eliminate ETL, access current data, maintain security Scale-out data marts cache data to boost performance Managing all data SQL Server can now read and write to HDFS Store high volume data in a data lake and analyze it easily using either T-SQL or Spark Management services, admin portal, and integrated security make it all easy to manage Analyzing all data Perform analytics over structured and unstructured data in real time Easily feed integrated data from many sources to your model training Ingest and prep data and then train, store, and operationalize your models all in one system
  33. Increase analytics and apps performance with scale out data pools
  34. Microsoft Azure supports other services like Azure HDInsight, Azure Data Lake, Azure IoT Hub, Azure Events Hub in various layers of the architecture above to allow customers a truly customized solution.
  35. 1) Copy source data into the Azure Data Lake Store (twitter data example) 2) Massage/filter the data using Hadoop (or skip using Hadoop and use stored procedures in SQL DW/DB to massage data after step #5) 3) Pass data into Azure ML to build models using Hive query (or pass in directly from Azure Data Lake Store) 4) Azure ML feeds prediction results into the data warehouse 5) Non-relational data in Azure Data Lake Store copied to data warehouse in relational format (optionally use PolyBase with external tables to avoid copying data) 6) Power BI pulls data from data warehouse to build dashboards and reports 7) Azure Data Catalog captures metadata from Azure Data Lake Store and SQL DW/DB 8) Power BI and Excel can pull data from the Azure Data Lake Store via HDInsight 9) To support high concurrency if using SQL DW, or for easier end-user data layer, create an SSAS cube
  36. Individual/Personal BI vs Departmental/Team BI vs Enterprise/Corporate BI