Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecting Modern Data Platforms

292 views

Published on

Welcome to my post on ‘Architecting Modern Data Platforms’, here I will be discussing how to design cutting edge data analytics platforms which meet the ever-evolving data & analytics needs for the business.

https://www.ankitrathi.com

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Architecting Modern Data Platforms

  1. 1. Navigate Architecting Modern Data Platforms by ankitrathi.com
  2. 2. Content • Data Architecture Principles • Data Lake Basics • High Level Architecture • Data Characteristics • Putting It All Together • Product-Driven Data Architecture • Reference Architecture
  3. 3. Data Architecture Principals • Adhere to ADDA (Accessibility, Definition, Decoupling, Agility) • Design for RSM (Reliability, Scalability, Maintainability) • Use Right Tools • Cloud Native/Agnostic • Be Cost Conscious
  4. 4. Adhere to ADDA Accessibility Easily accessible data for business Definition Data catalog for simplified data discovery Decoupling Decoupled layers for flexibility Agility Agile enough to cater evolving business requirements
  5. 5. Design for RSM Reliability works correctly, fault-tolerant Scalability adapts to growth Maintainability remains easy to maintain
  6. 6. Use Right Tools Data Structure Structured, Semi- structured, Unstructured Latency Low, Medium, High Throughput High, Medium, Low Access Pattern Key-value, Search, Transactions
  7. 7. Cloud Native/Agnostic Cloud Native Cloud Agnostic Pros: • Better performance • Better efficiency • Lower costs (generic services) Pros: • Flexibility • Minimal vendor lock-in • Standard performance Cons: • Vendor lock-in • Higher costs (specific services) Cons: • Underutilization of vendor capabilities • Solution can become complex • Performance, logging and monitoring can take a hit
  8. 8. Be Cost Conscious • Efficient consumption of services • Select cost-conscious options • Enforce policies and controls
  9. 9. Data Lake • Data Lake Definition • An architectural approach • Massive heterogenous data stored centrally • Available to diverse group of users • To be categorized, processed, analyzed & consumed • Data Lake Characteristics • Structured, semi-structured & unstructured data • Scaled out as required • Diverse set of storage, analytics and ML/AI tools • Designed for low-cost storage and analytics
  10. 10. High-Level Architecture Process/ Analyse Ingest Store Serve Latency, Throughput, Cost Data Actionable Insights
  11. 11. Ingest Source Data Type Data Web/Mobile Apps Records Transactions Databases Records Transactions Logging Search documents Files Logging Log files Files Messaging Messages Events IoT Data Streams Events
  12. 12. Data Characteristics Hot Warm Cold Volume MB-GB GB-PB PB-EB Item Size B-KB KB-MB KB-TB Latency ms ms, sec min, hrs Durability Low-high High Very high Request Rate Very high High Low Cost/GB $$-$ $-¢¢ ¢¢-¢
  13. 13. Data Characteristics • Type of Data Structures • Fixed Schema • Schema Free • Key-Value • Type of Access Patterns • Key-Value • Simple relations (1:N, M:N) • Multi-table joins, transactions • Faceting, Search
  14. 14. Storage In-memory File Storage NoSQL SQL Hot data Warm data Cold data Structure HighLow Request rate, Cost per GBHigh Low Latency, Data VolumeLow High
  15. 15. Analytics Types • Message/Stream Analysis • Interactive Analysis • Batch Analysis • Machine Learning/AI
  16. 16. ETL Processing Process/AnalyseStore ETL
  17. 17. Serve • Applications & APIs • Analysis & Visualization • Notebooks • IDEs
  18. 18. Putting It All Together Process/AnalyseStore ETL Ingest Serve Web Apps Mobile Apps Data Centers Logging Messaging Devices Sensors Cache NoSQL SQL ElasticSearch Object Storage SQS Streams ML/AI Interactive Batch Message Streams APIs Analysis Visualization Notebooks IDE Records Documents Files Messages Streams Security & Governance, Data Catalog
  19. 19. Product-Driven Data Architecture Reference: https://martinfowler.com/articles/data-monolith-to-mesh.html
  20. 20. Reference Architecture - Azure Reference: https://docs.microsoft.com/en-us/azure/architecture/example-scenario/dataplate2e/data-platform-end-to-end
  21. 21. Reference Architecture - AWS Reference: https://docs.aws.amazon.com/solutions/latest/data-lake-solution/architecture.html
  22. 22. Reference Architecture - GCP Reference: https://cloud.google.com/solutions/big-data
  23. 23. Navigate Questions…?
  24. 24. Navigate Thank You ankitrathi.com

×