SlideShare a Scribd company logo
@nileshgule
Modern Data Warehouse
using
Azure
$whoami
{
“name” : “Nilesh Gule”,
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://GitHub.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“likes” : “Technical Evangelism, Cricket”,
“co-organizer” : “Azure Singapore UG”
}
@nileshgule
Evolution of Data Warehousing
Dept
Dept
Dept
Staging Raw Query Report
Real time processing
Dept
Dept
Dept
@nileshgule
Evolution of Data Warehousing
Raw Query Report
Real time processing
Dept
Dept
Dept
Raw Query Report
Data Warehouse
• Store data from multiple data sources
• Used for historical and trend analysis reporting
• Central repository for many subject areas
• Contains single version of truth
• Not to be used for OLTP applications
Main features
• Reduce stress on production systems
• Optimized for read access
• Keep historical records
• Restructure / rename tables, fields, model data
• Protect against source system upgrades
• Use Master Data Management
• Easy to create BI solutions on top of it (e.g.
Azure Analysis Services Cubes)
• Reduces security access to multiple production
systems
Use cases
Credits: James Serra
Credits: James Serra
Credits: James Serra
Credits: James Serra
Azure Data Lake Storage
• Petabyte scale storage
• Hierarchical namespace
• Hadoop compatible access with ABFS
driver
Main features
• Use Service Principles
• Use Security Groups over individual
users
• Enable Gen 2 firewall with Azure
services access
ADLS best practices
Azure Data Factory
• Cloud ETL service
• Scale-out serverless data integration & data
transformation
• Code-free UI
• Monitoring & Management
Main features
Azure Data Factory – copy data
Azure Data Factory – Mapping Data Flow
Credits: Harun Legoz for football match analysis example
Azure Data Factory - Monitoring
Azure Databricks
• Collaborative Spark based Analytical service
• Optimized Spark environment for data pipelines and ML
Delta Lake
• ACID transactions
• Time travel (data versioning enables rollbacks, audit trail)
• Streaming and batch unification
• Schema enforcement
• Upserts and deletes
• Performance improvements
https://delta.io/
Azure Databricks - clusters
Azure Databricks - Jobs
Best in class price
per performance
Developer
productivity
Workload aware
query execution
Data flexibility
Up to 94% less expensive
than competitors
Manage heterogenous
workloads through
workload priorities and
isolation
Ingest variety of data
sources to derive the
maximum benefit.
Query all data.
Use preferred tooling for
SQL data warehouse
development
Industry-leading
security
Defense-in-depth
security and 99.9%
financially backed
availability SLA
Azure Synapse – SQL Analytics
focus areas
Credits: James Serra
Data Lake with Data Warehouse use cases
• Data scientists/power users
• Batch processing
• Data refinement / cleaning
• ETL workloads
• Store older / backup data
• Sandbox for data exploration
• One-time reports
• Quick access to data
• Don’t know questions
Data Lake – staging & preparation
• Business people
• Low latency
• Complex joins
• Interactive ad-hoc queries
• High number of users
• Additional security
• Large support for tools
• Dashboards
• Self-service BI
• Known questions
Data Warehouse – Serving, security &
Compliance
Data Lakehouse concerns / limitations
• Reliability – keeping data lake and warehouse
consistent
• Data staleness – older data in warehouse
• Limited support for advanced analytics – Top ML
systems don’t work well on warehouses
• TCO – extra cost for data copies in warehouse
Problems
• Speed – RDBMS faster, especially MPP
• Security - No RLS, column-level, dynamic data
masking
• Complexity – metadata separate from data, file
based
• Missing features – referential integrity, TDE,
workload management, many features require
Spark lockin
Concerns wrt RDBMS
@nileshgule
Evolution of Data Warehousing …
Centralized Decentralized
Data Mesh | Thoughtworks
@nileshgule
References
Data Warehouse
❖ Why you need a data warehouse
❖ James Serra website
Azure
❖ ADLS Gen 2 Storage Account
❖ Azure Data Factory
❖ Azure Databricks
❖ Azure Data Factory Mapping Data Flows
❖ Azure Synapse Analytics
❖ Azure Synapse Link for SQL
Delta Lake & Data Mesh
❖ Databricks Deltalake
❖ Delta
❖ Data Mesh Architecture
❖ ThoughtWorks – Data Mesh
Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://bit.ly/youtube-nileshgule
Q&A

More Related Content

Similar to Modern Data Warehouse using Azure.pdf

Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
Ike Ellis
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera, Inc.
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Cloudera, Inc.
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)
Turner Kunkel
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
CAMMS
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Hadoop as a data hub featuring sears
Hadoop as a data hub featuring searsHadoop as a data hub featuring sears
Hadoop as a data hub featuring searsDianna Doan
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
Anu Ravindranath
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
Senturus
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
Usama Wahab Khan Cloud, Data and AI
 

Similar to Modern Data Warehouse using Azure.pdf (20)

Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Hadoop as a data hub featuring sears
Hadoop as a data hub featuring searsHadoop as a data hub featuring sears
Hadoop as a data hub featuring sears
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 

More from Nilesh Gule

Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
Code Creativity and Customers- Navigating the Generative AI Landscape.pdfCode Creativity and Customers- Navigating the Generative AI Landscape.pdf
Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
Nilesh Gule
 
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
Improve Monitoring And Observability for Kubernetes with OSS tools.pdfImprove Monitoring And Observability for Kubernetes with OSS tools.pdf
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
Nilesh Gule
 
Modular Architecturs for Resilience and Adaptability.pdf
Modular Architecturs for Resilience and Adaptability.pdfModular Architecturs for Resilience and Adaptability.pdf
Modular Architecturs for Resilience and Adaptability.pdf
Nilesh Gule
 
Autoscale applications based on external events with KEDA.pdf
Autoscale applications based on external events with KEDA.pdfAutoscale applications based on external events with KEDA.pdf
Autoscale applications based on external events with KEDA.pdf
Nilesh Gule
 
Singapore JUG - Open Telemetry.pdf
Singapore JUG - Open Telemetry.pdfSingapore JUG - Open Telemetry.pdf
Singapore JUG - Open Telemetry.pdf
Nilesh Gule
 
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdfCloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Nilesh Gule
 
Build Secure Portable Applications using AKS and its ecosystem
Build Secure Portable Applications using AKS and its ecosystemBuild Secure Portable Applications using AKS and its ecosystem
Build Secure Portable Applications using AKS and its ecosystem
Nilesh Gule
 
Cloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdfCloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdf
Nilesh Gule
 
Cloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdfCloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdf
Nilesh Gule
 
Modular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdfModular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdf
Nilesh Gule
 
Modular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdfModular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdf
Nilesh Gule
 
Cloud Native Ninja - PT7 - Containerize Go apps.pdf
Cloud Native Ninja - PT7 - Containerize Go apps.pdfCloud Native Ninja - PT7 - Containerize Go apps.pdf
Cloud Native Ninja - PT7 - Containerize Go apps.pdf
Nilesh Gule
 
Cloud Native Ninja - PT6 - Containerize Spring Boot apps.pdf
Cloud Native Ninja - PT6 - Containerize Spring Boot apps.pdfCloud Native Ninja - PT6 - Containerize Spring Boot apps.pdf
Cloud Native Ninja - PT6 - Containerize Spring Boot apps.pdf
Nilesh Gule
 
Cloud Native Ninja - PT5 - Publish container images.pdf
Cloud Native Ninja - PT5 - Publish container images.pdfCloud Native Ninja - PT5 - Publish container images.pdf
Cloud Native Ninja - PT5 - Publish container images.pdf
Nilesh Gule
 
Portable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdfPortable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdf
Nilesh Gule
 
Portable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdfPortable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdf
Nilesh Gule
 
Manage Multi Container Apps with Docker Compose.pdf
Manage Multi Container Apps with Docker Compose.pdfManage Multi Container Apps with Docker Compose.pdf
Manage Multi Container Apps with Docker Compose.pdf
Nilesh Gule
 
Portable Multi-cloud Microservices with Dapr .pptx
Portable Multi-cloud Microservices with Dapr .pptxPortable Multi-cloud Microservices with Dapr .pptx
Portable Multi-cloud Microservices with Dapr .pptx
Nilesh Gule
 
Cloud Native Ninja - PT3 - Containerize DOTNET apps.pdf
Cloud Native Ninja - PT3 - Containerize DOTNET apps.pdfCloud Native Ninja - PT3 - Containerize DOTNET apps.pdf
Cloud Native Ninja - PT3 - Containerize DOTNET apps.pdf
Nilesh Gule
 
Cloud Native Ninja - Distributed Microservices with Dapr - part 2.pdf
Cloud Native Ninja - Distributed Microservices with Dapr - part 2.pdfCloud Native Ninja - Distributed Microservices with Dapr - part 2.pdf
Cloud Native Ninja - Distributed Microservices with Dapr - part 2.pdf
Nilesh Gule
 

More from Nilesh Gule (20)

Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
Code Creativity and Customers- Navigating the Generative AI Landscape.pdfCode Creativity and Customers- Navigating the Generative AI Landscape.pdf
Code Creativity and Customers- Navigating the Generative AI Landscape.pdf
 
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
Improve Monitoring And Observability for Kubernetes with OSS tools.pdfImprove Monitoring And Observability for Kubernetes with OSS tools.pdf
Improve Monitoring And Observability for Kubernetes with OSS tools.pdf
 
Modular Architecturs for Resilience and Adaptability.pdf
Modular Architecturs for Resilience and Adaptability.pdfModular Architecturs for Resilience and Adaptability.pdf
Modular Architecturs for Resilience and Adaptability.pdf
 
Autoscale applications based on external events with KEDA.pdf
Autoscale applications based on external events with KEDA.pdfAutoscale applications based on external events with KEDA.pdf
Autoscale applications based on external events with KEDA.pdf
 
Singapore JUG - Open Telemetry.pdf
Singapore JUG - Open Telemetry.pdfSingapore JUG - Open Telemetry.pdf
Singapore JUG - Open Telemetry.pdf
 
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdfCloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
Cloud Native Ninja - Getting Started with Kubernetes - Part 9.pdf
 
Build Secure Portable Applications using AKS and its ecosystem
Build Secure Portable Applications using AKS and its ecosystemBuild Secure Portable Applications using AKS and its ecosystem
Build Secure Portable Applications using AKS and its ecosystem
 
Cloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdfCloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdf
 
Cloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdfCloud Native Ninja - PT8 - Containerize React app.pdf
Cloud Native Ninja - PT8 - Containerize React app.pdf
 
Modular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdfModular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdf
 
Modular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdfModular Architecturs for resilience and Adaptability.pdf
Modular Architecturs for resilience and Adaptability.pdf
 
Cloud Native Ninja - PT7 - Containerize Go apps.pdf
Cloud Native Ninja - PT7 - Containerize Go apps.pdfCloud Native Ninja - PT7 - Containerize Go apps.pdf
Cloud Native Ninja - PT7 - Containerize Go apps.pdf
 
Cloud Native Ninja - PT6 - Containerize Spring Boot apps.pdf
Cloud Native Ninja - PT6 - Containerize Spring Boot apps.pdfCloud Native Ninja - PT6 - Containerize Spring Boot apps.pdf
Cloud Native Ninja - PT6 - Containerize Spring Boot apps.pdf
 
Cloud Native Ninja - PT5 - Publish container images.pdf
Cloud Native Ninja - PT5 - Publish container images.pdfCloud Native Ninja - PT5 - Publish container images.pdf
Cloud Native Ninja - PT5 - Publish container images.pdf
 
Portable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdfPortable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdf
 
Portable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdfPortable Multi-cloud Microservices with Dapr .pdf
Portable Multi-cloud Microservices with Dapr .pdf
 
Manage Multi Container Apps with Docker Compose.pdf
Manage Multi Container Apps with Docker Compose.pdfManage Multi Container Apps with Docker Compose.pdf
Manage Multi Container Apps with Docker Compose.pdf
 
Portable Multi-cloud Microservices with Dapr .pptx
Portable Multi-cloud Microservices with Dapr .pptxPortable Multi-cloud Microservices with Dapr .pptx
Portable Multi-cloud Microservices with Dapr .pptx
 
Cloud Native Ninja - PT3 - Containerize DOTNET apps.pdf
Cloud Native Ninja - PT3 - Containerize DOTNET apps.pdfCloud Native Ninja - PT3 - Containerize DOTNET apps.pdf
Cloud Native Ninja - PT3 - Containerize DOTNET apps.pdf
 
Cloud Native Ninja - Distributed Microservices with Dapr - part 2.pdf
Cloud Native Ninja - Distributed Microservices with Dapr - part 2.pdfCloud Native Ninja - Distributed Microservices with Dapr - part 2.pdf
Cloud Native Ninja - Distributed Microservices with Dapr - part 2.pdf
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Modern Data Warehouse using Azure.pdf

  • 2. $whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github” : “https://GitHub.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “likes” : “Technical Evangelism, Cricket”, “co-organizer” : “Azure Singapore UG” }
  • 3.
  • 4.
  • 5. @nileshgule Evolution of Data Warehousing Dept Dept Dept Staging Raw Query Report Real time processing Dept Dept Dept
  • 6. @nileshgule Evolution of Data Warehousing Raw Query Report Real time processing Dept Dept Dept Raw Query Report
  • 7. Data Warehouse • Store data from multiple data sources • Used for historical and trend analysis reporting • Central repository for many subject areas • Contains single version of truth • Not to be used for OLTP applications Main features • Reduce stress on production systems • Optimized for read access • Keep historical records • Restructure / rename tables, fields, model data • Protect against source system upgrades • Use Master Data Management • Easy to create BI solutions on top of it (e.g. Azure Analysis Services Cubes) • Reduces security access to multiple production systems Use cases
  • 12. Azure Data Lake Storage • Petabyte scale storage • Hierarchical namespace • Hadoop compatible access with ABFS driver Main features • Use Service Principles • Use Security Groups over individual users • Enable Gen 2 firewall with Azure services access ADLS best practices
  • 13. Azure Data Factory • Cloud ETL service • Scale-out serverless data integration & data transformation • Code-free UI • Monitoring & Management Main features
  • 14. Azure Data Factory – copy data
  • 15. Azure Data Factory – Mapping Data Flow Credits: Harun Legoz for football match analysis example
  • 16. Azure Data Factory - Monitoring
  • 17. Azure Databricks • Collaborative Spark based Analytical service • Optimized Spark environment for data pipelines and ML
  • 18. Delta Lake • ACID transactions • Time travel (data versioning enables rollbacks, audit trail) • Streaming and batch unification • Schema enforcement • Upserts and deletes • Performance improvements https://delta.io/
  • 19. Azure Databricks - clusters
  • 21. Best in class price per performance Developer productivity Workload aware query execution Data flexibility Up to 94% less expensive than competitors Manage heterogenous workloads through workload priorities and isolation Ingest variety of data sources to derive the maximum benefit. Query all data. Use preferred tooling for SQL data warehouse development Industry-leading security Defense-in-depth security and 99.9% financially backed availability SLA Azure Synapse – SQL Analytics focus areas Credits: James Serra
  • 22. Data Lake with Data Warehouse use cases • Data scientists/power users • Batch processing • Data refinement / cleaning • ETL workloads • Store older / backup data • Sandbox for data exploration • One-time reports • Quick access to data • Don’t know questions Data Lake – staging & preparation • Business people • Low latency • Complex joins • Interactive ad-hoc queries • High number of users • Additional security • Large support for tools • Dashboards • Self-service BI • Known questions Data Warehouse – Serving, security & Compliance
  • 23. Data Lakehouse concerns / limitations • Reliability – keeping data lake and warehouse consistent • Data staleness – older data in warehouse • Limited support for advanced analytics – Top ML systems don’t work well on warehouses • TCO – extra cost for data copies in warehouse Problems • Speed – RDBMS faster, especially MPP • Security - No RLS, column-level, dynamic data masking • Complexity – metadata separate from data, file based • Missing features – referential integrity, TDE, workload management, many features require Spark lockin Concerns wrt RDBMS
  • 24. @nileshgule Evolution of Data Warehousing … Centralized Decentralized Data Mesh | Thoughtworks
  • 25. @nileshgule References Data Warehouse ❖ Why you need a data warehouse ❖ James Serra website Azure ❖ ADLS Gen 2 Storage Account ❖ Azure Data Factory ❖ Azure Databricks ❖ Azure Data Factory Mapping Data Flows ❖ Azure Synapse Analytics ❖ Azure Synapse Link for SQL Delta Lake & Data Mesh ❖ Databricks Deltalake ❖ Delta ❖ Data Mesh Architecture ❖ ThoughtWorks – Data Mesh
  • 26. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com https://bit.ly/youtube-nileshgule
  • 27. Q&A