Building a Modern
Data Warehouse
Elena López
Microsoft MVP – Data Platform
¿Quién es Elena López?
Ing. de Sistemas
Datos, datos y más datos
Curiosa, emprendedora, autodidacta
Humor, cerveza, naturaleza
Aprendiendo BD con SQL Server Elena López + SQL Server
¿Qué es un
Data Warehouse?
¿Desaparecerá el
Data Warehouse ?
Es necesario ahora más que nunca:
 Integra múltiples fuentes de datos
 Disminuye el impacto negativo de
reportes a producción
 Análisis histórico de los datos
 Estructura amigable
 Erradica los silos
 Brindan una única versión de la verdad
RETOS
(Nuestros)
Nosotros Los Datos
¿Qué hace moderno a
un Data Warehouse ?
 Procesamiento de grandes volúmenes
de datos
 Capacidad de procesar datos casi en
tiempo real y a gran velocidad
 Apoya el auto-servicio
 Fomenta la democratización de la data
 Facilita la exploración de los datos
 Visualización dinámica
 Infraestructura híbrida o en la nube
Modern Data Warehouse
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Cosmos DB
Azure Databricks
Azure HDInsight
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
Cosmos DB
Power BI Aggregations
Ingest – Data Orchestration and Monitoring
Modern Data Warehouse
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Cosmos DB
Azure Databricks
Azure HDInsight
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
Cosmos DB
Power BI Aggregations
Store – Big Data Store
Modern Data Warehouse
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Cosmos DB
Azure Databricks
Azure HDInsight
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
Cosmos DB
Power BI Aggregations
A “no-compromises” Data Lake: secure, performant, massively-scalable Data Lake storage that brings the cost and
scale profile of object storage together with the performance and analytics feature set of data lake storage
Azure Data Lake Storage Gen2
M A N A G E A B L E S C A L A B L EF A S TS E C U R E
 No limits on
data store size
 Global footprint
(50 regions)
 Optimized for Spark
and Hadoop
Analytic Engines
 Tightly integrated
with Azure end to
end analytics
solutions
 Automated
Lifecycle Policy
Management
 Object Level
tiering
 Support for fine-
grained ACLs,
protecting data at the
file and folder level
 Multi-layered
protection via at-rest
Storage Service
encryption and Azure
Active Directory
integration
C O S T
E F F E C T I V E
I N T E G R AT I O N
R E A D Y
 Atomic file
operations
means jobs
complete faster
 Object store
pricing levels
 File system
operations
minimize
transactions
required for job
completion
Objectives
 Plan the structure based on optimal data retrieval
 Avoid a chaotic, unorganized data swamp
Data Retention Policy
Temporary data
Permanent data
Applicable period (ex: project lifetime)
etc…
Business Impact / Criticality
High (HBI)
Medium (MBI)
Low (LBI)
etc…
Confidential Classification
Public information
Internal use only
Supplier/partner confidential
Personally identifiable information (PII)
Sensitive – financial
Sensitive – intellectual property
etc…
Probability of Data Access
Recent/current data
Historical data
etc…
Owner / Steward / SME
Subject Area
Security Boundaries
Department
Business unit
etc…
Time Partitioning
Year/Month/Day/Hour/Minute
Downstream App/Purpose
Common ways to organize the data:
Organizing a Data Lake – Folder structure
Prep – Transform and Clean
Modern Data Warehouse
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Cosmos DB
Azure Databricks
Azure HDInsight
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
Cosmos DB
Power BI Aggregations
Model & Serve – Data Warehouse
Modern Data Warehouse
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Cosmos DB
Azure Databricks
Azure HDInsight
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
Cosmos DB
Power BI Aggregations

Modern data warehouse

  • 1.
    Building a Modern DataWarehouse Elena López Microsoft MVP – Data Platform
  • 2.
    ¿Quién es ElenaLópez? Ing. de Sistemas Datos, datos y más datos Curiosa, emprendedora, autodidacta Humor, cerveza, naturaleza
  • 3.
    Aprendiendo BD conSQL Server Elena López + SQL Server
  • 4.
  • 5.
    ¿Desaparecerá el Data Warehouse? Es necesario ahora más que nunca:  Integra múltiples fuentes de datos  Disminuye el impacto negativo de reportes a producción  Análisis histórico de los datos  Estructura amigable  Erradica los silos  Brindan una única versión de la verdad
  • 6.
  • 7.
    ¿Qué hace modernoa un Data Warehouse ?  Procesamiento de grandes volúmenes de datos  Capacidad de procesar datos casi en tiempo real y a gran velocidad  Apoya el auto-servicio  Fomenta la democratización de la data  Facilita la exploración de los datos  Visualización dinámica  Infraestructura híbrida o en la nube
  • 8.
    Modern Data Warehouse AdvancedAnalytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Cosmos DB Azure Databricks Azure HDInsight Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services Cosmos DB Power BI Aggregations
  • 9.
    Ingest – DataOrchestration and Monitoring
  • 10.
    Modern Data Warehouse AdvancedAnalytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Cosmos DB Azure Databricks Azure HDInsight Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services Cosmos DB Power BI Aggregations
  • 11.
    Store – BigData Store
  • 12.
    Modern Data Warehouse AdvancedAnalytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Cosmos DB Azure Databricks Azure HDInsight Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services Cosmos DB Power BI Aggregations
  • 13.
    A “no-compromises” DataLake: secure, performant, massively-scalable Data Lake storage that brings the cost and scale profile of object storage together with the performance and analytics feature set of data lake storage Azure Data Lake Storage Gen2 M A N A G E A B L E S C A L A B L EF A S TS E C U R E  No limits on data store size  Global footprint (50 regions)  Optimized for Spark and Hadoop Analytic Engines  Tightly integrated with Azure end to end analytics solutions  Automated Lifecycle Policy Management  Object Level tiering  Support for fine- grained ACLs, protecting data at the file and folder level  Multi-layered protection via at-rest Storage Service encryption and Azure Active Directory integration C O S T E F F E C T I V E I N T E G R AT I O N R E A D Y  Atomic file operations means jobs complete faster  Object store pricing levels  File system operations minimize transactions required for job completion
  • 14.
    Objectives  Plan thestructure based on optimal data retrieval  Avoid a chaotic, unorganized data swamp Data Retention Policy Temporary data Permanent data Applicable period (ex: project lifetime) etc… Business Impact / Criticality High (HBI) Medium (MBI) Low (LBI) etc… Confidential Classification Public information Internal use only Supplier/partner confidential Personally identifiable information (PII) Sensitive – financial Sensitive – intellectual property etc… Probability of Data Access Recent/current data Historical data etc… Owner / Steward / SME Subject Area Security Boundaries Department Business unit etc… Time Partitioning Year/Month/Day/Hour/Minute Downstream App/Purpose Common ways to organize the data: Organizing a Data Lake – Folder structure
  • 15.
  • 16.
    Modern Data Warehouse AdvancedAnalytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Cosmos DB Azure Databricks Azure HDInsight Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services Cosmos DB Power BI Aggregations
  • 17.
    Model & Serve– Data Warehouse
  • 18.
    Modern Data Warehouse AdvancedAnalytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE (& store) Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Blob Storage Cosmos DB Azure Databricks Azure HDInsight Power BI Dataflow Azure Data Lake Analytics Azure SQL Data Warehouse Azure Analysis Services Cosmos DB Power BI Aggregations

Editor's Notes

  • #9 8
  • #10 https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  • #11 10
  • #12 https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  • #13 12
  • #16 https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  • #17 16
  • #18 https://www.jamesserra.com/archive/2019/01/what-product-to-use-to-transform-my-data/
  • #19 18