SlideShare a Scribd company logo
Navigate
Architecting
Modern Data Platforms
by ankitrathi.com
Content
• Data Architecture Principles
• Data Lake Basics
• High Level Architecture
• Data Characteristics
• Putting It All Together
• Product-Driven Data Architecture
• Reference Architecture
Data Architecture Principals
• Adhere to ADDA (Accessibility, Definition, Decoupling, Agility)
• Design for RSM (Reliability, Scalability, Maintainability)
• Use Right Tools
• Cloud Native/Agnostic
• Be Cost Conscious
Adhere to ADDA
Accessibility
Easily accessible data
for business
Definition
Data catalog for
simplified data
discovery
Decoupling
Decoupled layers for
flexibility
Agility
Agile enough to cater
evolving business
requirements
Design for RSM
Reliability
works correctly,
fault-tolerant
Scalability
adapts to growth
Maintainability
remains easy to maintain
Use Right Tools
Data Structure
Structured, Semi-
structured, Unstructured
Latency
Low, Medium, High
Throughput
High, Medium, Low
Access Pattern
Key-value, Search,
Transactions
Cloud Native/Agnostic
Cloud Native Cloud Agnostic
Pros:
• Better performance
• Better efficiency
• Lower costs (generic services)
Pros:
• Flexibility
• Minimal vendor lock-in
• Standard performance
Cons:
• Vendor lock-in
• Higher costs (specific services)
Cons:
• Underutilization of vendor capabilities
• Solution can become complex
• Performance, logging and monitoring
can take a hit
Be Cost Conscious
• Efficient consumption of services
• Select cost-conscious options
• Enforce policies and controls
Data Lake
• Data Lake Definition
• An architectural approach
• Massive heterogenous data stored centrally
• Available to diverse group of users
• To be categorized, processed, analyzed & consumed
• Data Lake Characteristics
• Structured, semi-structured & unstructured data
• Scaled out as required
• Diverse set of storage, analytics and ML/AI tools
• Designed for low-cost storage and analytics
High-Level Architecture
Process/
Analyse
Ingest Store Serve
Latency, Throughput, Cost
Data Actionable Insights
Ingest
Source Data Type Data
Web/Mobile Apps Records Transactions
Databases Records Transactions
Logging Search documents Files
Logging Log files Files
Messaging Messages Events
IoT Data Streams Events
Data Characteristics
Hot Warm Cold
Volume MB-GB GB-PB PB-EB
Item Size B-KB KB-MB KB-TB
Latency ms ms, sec min, hrs
Durability Low-high High Very high
Request Rate Very high High Low
Cost/GB $$-$ $-¢¢ ¢¢-¢
Data Characteristics
• Type of Data Structures
• Fixed Schema
• Schema Free
• Key-Value
• Type of Access Patterns
• Key-Value
• Simple relations (1:N, M:N)
• Multi-table joins, transactions
• Faceting, Search
Storage
In-memory
File Storage
NoSQL
SQL
Hot data Warm data Cold data
Structure
HighLow
Request rate, Cost per GBHigh Low
Latency, Data VolumeLow High
Analytics Types
• Message/Stream Analysis
• Interactive Analysis
• Batch Analysis
• Machine Learning/AI
ETL Processing
Process/AnalyseStore ETL
Serve
• Applications & APIs
• Analysis & Visualization
• Notebooks
• IDEs
Putting It All Together
Process/AnalyseStore
ETL
Ingest Serve
Web Apps
Mobile Apps
Data Centers
Logging
Messaging
Devices
Sensors
Cache
NoSQL
SQL
ElasticSearch
Object Storage
SQS
Streams
ML/AI
Interactive
Batch
Message
Streams
APIs
Analysis
Visualization
Notebooks
IDE
Records
Documents
Files
Messages
Streams
Security & Governance, Data Catalog
Product-Driven Data Architecture
Reference: https://martinfowler.com/articles/data-monolith-to-mesh.html
Reference Architecture - Azure
Reference: https://docs.microsoft.com/en-us/azure/architecture/example-scenario/dataplate2e/data-platform-end-to-end
Reference Architecture - AWS
Reference: https://docs.aws.amazon.com/solutions/latest/data-lake-solution/architecture.html
Reference Architecture - GCP
Reference: https://cloud.google.com/solutions/big-data
Navigate
Questions…?
Navigate
Thank You
ankitrathi.com

More Related Content

What's hot

Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Data Domain-Driven Design
Data Domain-Driven DesignData Domain-Driven Design
Data Domain-Driven Design
Kiran Kumar Chittoori
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
Hans Verstraeten
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
Julien Le Dem
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
DATAVERSITY
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DATAVERSITY
 
Data Mesh
Data MeshData Mesh
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
DATAVERSITY
 
Data Governance
Data GovernanceData Governance
Data Governance
Rob Lux
 

What's hot (20)

Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data Domain-Driven Design
Data Domain-Driven DesignData Domain-Driven Design
Data Domain-Driven Design
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Data Governance
Data GovernanceData Governance
Data Governance
 

Similar to Architecting Modern Data Platforms

Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
Amazon Web Services
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
ReyersonMax
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
GenrlUse1
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
Amazon Web Services
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
MariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB plc
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
Anu Ravindranath
 
Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSight
Amazon Web Services
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
Amazon Web Services
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Amazon Web Services
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU
 

Similar to Architecting Modern Data Platforms (20)

Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Architectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoopArchitectures styles and deployment on the hadoop
Architectures styles and deployment on the hadoop
 
Serverless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSightServerless Big Data Analytics with Amazon Athena and QuickSight
Serverless Big Data Analytics with Amazon Athena and QuickSight
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
 

More from Ankit Rathi

5 Data Science Use Cases for Every Business
5 Data Science Use Cases for Every Business5 Data Science Use Cases for Every Business
5 Data Science Use Cases for Every Business
Ankit Rathi
 
Kaggle Vs Real-world Projects
Kaggle Vs Real-world ProjectsKaggle Vs Real-world Projects
Kaggle Vs Real-world Projects
Ankit Rathi
 
SQL for Data Professionals (Beginner)
SQL for Data Professionals (Beginner)SQL for Data Professionals (Beginner)
SQL for Data Professionals (Beginner)
Ankit Rathi
 
Data & AI Session @ RBS
Data & AI Session @ RBSData & AI Session @ RBS
Data & AI Session @ RBS
Ankit Rathi
 
Data Professionals: Job of the Century
Data Professionals: Job of the CenturyData Professionals: Job of the Century
Data Professionals: Job of the Century
Ankit Rathi
 
Cloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsCloud Computing for Data Professionals
Cloud Computing for Data Professionals
Ankit Rathi
 
Data & AI Platform Concepts
Data & AI Platform ConceptsData & AI Platform Concepts
Data & AI Platform Concepts
Ankit Rathi
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Ankit Rathi
 
Artificial Intelligence Do-It-Yourself: Course Outline
Artificial Intelligence Do-It-Yourself: Course OutlineArtificial Intelligence Do-It-Yourself: Course Outline
Artificial Intelligence Do-It-Yourself: Course Outline
Ankit Rathi
 
Artificial Intelligence Do-It-Yourself: Course Intro
Artificial Intelligence Do-It-Yourself: Course IntroArtificial Intelligence Do-It-Yourself: Course Intro
Artificial Intelligence Do-It-Yourself: Course Intro
Ankit Rathi
 
Auto Encoder & Clustering Based Data Anonymization
Auto Encoder & Clustering Based Data AnonymizationAuto Encoder & Clustering Based Data Anonymization
Auto Encoder & Clustering Based Data Anonymization
Ankit Rathi
 
Analytics Induction
Analytics InductionAnalytics Induction
Analytics Induction
Ankit Rathi
 
Data Science Session
Data Science SessionData Science Session
Data Science Session
Ankit Rathi
 
Becoming Data-Driven
Becoming Data-DrivenBecoming Data-Driven
Becoming Data-Driven
Ankit Rathi
 
Machine Learning with Python
Machine Learning with PythonMachine Learning with Python
Machine Learning with Python
Ankit Rathi
 
Data My Perspective
Data My PerspectiveData My Perspective
Data My Perspective
Ankit Rathi
 
SPEM
SPEMSPEM
Big Data Overview
Big Data OverviewBig Data Overview
Big Data OverviewAnkit Rathi
 
Oracle DBKB Project
Oracle DBKB ProjectOracle DBKB Project
Oracle DBKB Project
Ankit Rathi
 

More from Ankit Rathi (19)

5 Data Science Use Cases for Every Business
5 Data Science Use Cases for Every Business5 Data Science Use Cases for Every Business
5 Data Science Use Cases for Every Business
 
Kaggle Vs Real-world Projects
Kaggle Vs Real-world ProjectsKaggle Vs Real-world Projects
Kaggle Vs Real-world Projects
 
SQL for Data Professionals (Beginner)
SQL for Data Professionals (Beginner)SQL for Data Professionals (Beginner)
SQL for Data Professionals (Beginner)
 
Data & AI Session @ RBS
Data & AI Session @ RBSData & AI Session @ RBS
Data & AI Session @ RBS
 
Data Professionals: Job of the Century
Data Professionals: Job of the CenturyData Professionals: Job of the Century
Data Professionals: Job of the Century
 
Cloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsCloud Computing for Data Professionals
Cloud Computing for Data Professionals
 
Data & AI Platform Concepts
Data & AI Platform ConceptsData & AI Platform Concepts
Data & AI Platform Concepts
 
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)
 
Artificial Intelligence Do-It-Yourself: Course Outline
Artificial Intelligence Do-It-Yourself: Course OutlineArtificial Intelligence Do-It-Yourself: Course Outline
Artificial Intelligence Do-It-Yourself: Course Outline
 
Artificial Intelligence Do-It-Yourself: Course Intro
Artificial Intelligence Do-It-Yourself: Course IntroArtificial Intelligence Do-It-Yourself: Course Intro
Artificial Intelligence Do-It-Yourself: Course Intro
 
Auto Encoder & Clustering Based Data Anonymization
Auto Encoder & Clustering Based Data AnonymizationAuto Encoder & Clustering Based Data Anonymization
Auto Encoder & Clustering Based Data Anonymization
 
Analytics Induction
Analytics InductionAnalytics Induction
Analytics Induction
 
Data Science Session
Data Science SessionData Science Session
Data Science Session
 
Becoming Data-Driven
Becoming Data-DrivenBecoming Data-Driven
Becoming Data-Driven
 
Machine Learning with Python
Machine Learning with PythonMachine Learning with Python
Machine Learning with Python
 
Data My Perspective
Data My PerspectiveData My Perspective
Data My Perspective
 
SPEM
SPEMSPEM
SPEM
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Oracle DBKB Project
Oracle DBKB ProjectOracle DBKB Project
Oracle DBKB Project
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Architecting Modern Data Platforms

  • 2. Content • Data Architecture Principles • Data Lake Basics • High Level Architecture • Data Characteristics • Putting It All Together • Product-Driven Data Architecture • Reference Architecture
  • 3. Data Architecture Principals • Adhere to ADDA (Accessibility, Definition, Decoupling, Agility) • Design for RSM (Reliability, Scalability, Maintainability) • Use Right Tools • Cloud Native/Agnostic • Be Cost Conscious
  • 4. Adhere to ADDA Accessibility Easily accessible data for business Definition Data catalog for simplified data discovery Decoupling Decoupled layers for flexibility Agility Agile enough to cater evolving business requirements
  • 5. Design for RSM Reliability works correctly, fault-tolerant Scalability adapts to growth Maintainability remains easy to maintain
  • 6. Use Right Tools Data Structure Structured, Semi- structured, Unstructured Latency Low, Medium, High Throughput High, Medium, Low Access Pattern Key-value, Search, Transactions
  • 7. Cloud Native/Agnostic Cloud Native Cloud Agnostic Pros: • Better performance • Better efficiency • Lower costs (generic services) Pros: • Flexibility • Minimal vendor lock-in • Standard performance Cons: • Vendor lock-in • Higher costs (specific services) Cons: • Underutilization of vendor capabilities • Solution can become complex • Performance, logging and monitoring can take a hit
  • 8. Be Cost Conscious • Efficient consumption of services • Select cost-conscious options • Enforce policies and controls
  • 9. Data Lake • Data Lake Definition • An architectural approach • Massive heterogenous data stored centrally • Available to diverse group of users • To be categorized, processed, analyzed & consumed • Data Lake Characteristics • Structured, semi-structured & unstructured data • Scaled out as required • Diverse set of storage, analytics and ML/AI tools • Designed for low-cost storage and analytics
  • 10. High-Level Architecture Process/ Analyse Ingest Store Serve Latency, Throughput, Cost Data Actionable Insights
  • 11. Ingest Source Data Type Data Web/Mobile Apps Records Transactions Databases Records Transactions Logging Search documents Files Logging Log files Files Messaging Messages Events IoT Data Streams Events
  • 12. Data Characteristics Hot Warm Cold Volume MB-GB GB-PB PB-EB Item Size B-KB KB-MB KB-TB Latency ms ms, sec min, hrs Durability Low-high High Very high Request Rate Very high High Low Cost/GB $$-$ $-¢¢ ¢¢-¢
  • 13. Data Characteristics • Type of Data Structures • Fixed Schema • Schema Free • Key-Value • Type of Access Patterns • Key-Value • Simple relations (1:N, M:N) • Multi-table joins, transactions • Faceting, Search
  • 14. Storage In-memory File Storage NoSQL SQL Hot data Warm data Cold data Structure HighLow Request rate, Cost per GBHigh Low Latency, Data VolumeLow High
  • 15. Analytics Types • Message/Stream Analysis • Interactive Analysis • Batch Analysis • Machine Learning/AI
  • 17. Serve • Applications & APIs • Analysis & Visualization • Notebooks • IDEs
  • 18. Putting It All Together Process/AnalyseStore ETL Ingest Serve Web Apps Mobile Apps Data Centers Logging Messaging Devices Sensors Cache NoSQL SQL ElasticSearch Object Storage SQS Streams ML/AI Interactive Batch Message Streams APIs Analysis Visualization Notebooks IDE Records Documents Files Messages Streams Security & Governance, Data Catalog
  • 19. Product-Driven Data Architecture Reference: https://martinfowler.com/articles/data-monolith-to-mesh.html
  • 20. Reference Architecture - Azure Reference: https://docs.microsoft.com/en-us/azure/architecture/example-scenario/dataplate2e/data-platform-end-to-end
  • 21. Reference Architecture - AWS Reference: https://docs.aws.amazon.com/solutions/latest/data-lake-solution/architecture.html
  • 22. Reference Architecture - GCP Reference: https://cloud.google.com/solutions/big-data