Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecture of Big Data Solutions

520 views

Published on

The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.

The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.

Published in: Data & Analytics
  • Be the first to comment

Architecture of Big Data Solutions

  1. 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Architecture of Big Data Solutions Guido Schmutz Frankfurt, 13.12.2017 @gschmutz guidoschmutz.wordpress.com
  2. 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Architektur of Big Data Solutions
  3. 3. Agenda 1. Introduction 2. Big Data & Fast Data Reference Architectures 3. Continuous Streaming Data Ingestion 4. Big Data & Cloud 5. Microservices Architecture 6. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  4. 4. Introduction Architektur of Big Data Solutions
  5. 5. Big Data Definition (4 Vs) + Time to action ? – Big Data + Real-Time = Stream Processing Characteristics of Big Data: Its Volume, Velocity and Variety in combination Architektur of Big Data Solutions
  6. 6. Architektur von Big Data Lösungen Enterprise Data Warehouse ETL / Stored Procedures Data Marts / Aggregations Location Social Clickstream Segmentation & Churn Analysis BI Tools Marketing Offers Billing & Ordering CRM / Profile Marketing Campaigns Architektur of Big Data Solutions
  7. 7. Traditional Flow Diagram - Challenges Enterprise Data Warehouse ETL / Stored Procedures Data Marts / Aggregations Location Social Clickstream Segmentation & Churn Analysis BI Tools Marketing Offers Billing & Ordering CRM / Profile Marketing Campaigns Limited Processing Power Does not model easily to traditional database schema Limited Processing Power Storage Scaling very expensive Based on sample / limited data Loss in Fidelity Other / New Data Sources High Voume and Velocity Architektur of Big Data Solutions
  8. 8. Big Data to the rescue? Why is a structuring / architecture important? Architektur of Big Data Solutions
  9. 9. Why talk about Big Data Architectures? Choosing the right architecture is key for any (big data) project Big Data is still quite a rather young field and therefore a “moving target” no standard architectures available which have been used for years In the past years, some architectures and best practices have evolved Know your use cases before choosing your architecture / technologies To have a reference architecture in place helps in choosing the right/matching technologies Architektur of Big Data Solutions
  10. 10. Big Data & Fast Data Reference Architectures Architektur of Big Data Solutions
  11. 11. Hadoop Clusterd Hadoop Cluster Big Data Cluster Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  12. 12. Hadoop Clusterd Hadoop Cluster Big Data Cluster Big Data Architecture - Hadoop BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  13. 13. Hadoop Clusterd Hadoop Cluster Big Data Cluster Big Data Architecture - Spark BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  14. 14. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub for handling streaming data BI Tools Enterprise Data Warehouse Event Hub SQL Search / Explore Online & Mobile Apps Search Data Flow • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data
  15. 15. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub for handling streaming data BI Tools Enterprise Data Warehouse Event Hub SQL Search / Explore Online & Mobile Apps Search Data Flow • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data Architektur of Big Data Solutions
  16. 16. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub for handling streaming data BI Tools Enterprise Data Warehouse Event Hub SQL Search / Explore Online & Mobile Apps Search Data Flow • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data high latency
  17. 17. “Data at Rest” vs. “Data in Motion” Architektur of Big Data Solutions Data at Rest Data in Motion
  18. 18. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Processing Cluster Streaming Analytics Architecture BI Tools Enterprise Data Warehouse Event Hub Search / Explore Online & Mobile Apps Search Data Flow Data Flow Results • Low Latency Processing • Alerting • ”Real-Time” Dashboard Stream Analytics Reference / Models Dashboard Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data
  19. 19. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Processing Cluster BI Tools Enterprise Data Warehouse Event Hub Search / Explore Online & Mobile Apps Search Data Flow Data Flow Results • Low Latency Processing • Alerting • ”Real-Time” Dashboard Stream Analytics Reference / Models Dashboard Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data Streaming Analytics Architecture – Open Source
  20. 20. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Processing Cluster Streaming Analytics Architecture BI Tools Enterprise Data Warehouse Event Hub Search / Explore Online & Mobile Apps Search Data Flow Data Flow Results • Low Latency Processing • Alerting • ”Real-Time” Dashboard Stream Analytics Reference / Models Dashboard Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data low latency without keeping raw data/events
  21. 21. Hadoop Clusterd Hadoop Cluster Event Processing Cluster Keep raw event data BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps Search Results Stream Analytics Reference / Models Dashboard Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub Event Hub Event Hub File Import / SQL Import Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Weather Data
  22. 22. “Lambda Architecture” for Big Data Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Event Hub Event Hub Event Hub SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Hadoop Clusterd Hadoop Cluster Event Processing Cluster Results Stream Analytics Reference / Models Dashboard Hadoop Clusterd Hadoop Cluster Big Data Cluster Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  23. 23. “Kappa Architecture” for Big Data Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Hadoop Clusterd Hadoop Cluster Event Processing Cluster Results Stream Analytics Reference / Models Dashboard Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  24. 24. Hadoop Clusterd Hadoop Cluster Big Data Cluster “Unified Architecture” for Big Data Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  25. 25. Continuous Streaming Data Ingestion Architektur of Big Data Solutions
  26. 26. Hadoop Clusterd Hadoop Cluster Big Data Cluster Continuous Data Ingestion Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions
  27. 27. Continuous Streaming Data Ingestion DB Source Big DataLog Stream Processing IoT Sensor Event Hub Topic Topic REST Topic IoT GW CDC GW Connect CDC DB Source Log CDC Native IoT Sensor IoT Sensor 31 Dataflow GW Topic Topic Queue Message GW Topic Dataflow GW Dataflow TopicREST 31 File Source Log Log Log Social Native Topic Topic Architektur of Big Data Solutions
  28. 28. Continuous Streaming Data Ingestion Architektur of Big Data Solutions SQL Polling Change Data Capture (CDC) File Polling File Stream (File Tailing) File Stream (Appender) Sensor Stream
  29. 29. Continuous Streaming Data Ingestion DB Source Big DataLog Stream Processing IoT Sensor Event Hub Topic Topic REST Topic IoT GW CDC GW Connect CDC DB Source Log CDC Native IoT Sensor 33 Dataflow GW Topic Topic Queue Message GW Topic Dataflow GW Dataflow TopicREST 33 File Source Log Log Log Social Native Topic Topic Architektur of Big Data Solutions
  30. 30. Big Data & Cloud Architektur of Big Data Solutions
  31. 31. Data Locality vs. Compute/Storage Separation Data Local Compute Separate Compute and Storage Worker #1 Disk Processing Master Node Worker #2 Disk Processing Worker #3 Disk Processing Network Storage Disk Disk Disk Compute #1 Processing Compute #2 Processing Compute #3 Processing Network Master Node Network Separation of compute and storage – the fundamental difference • store data in Object Storage instead of DFS • bring up Compute nodes only for data processing • multiple workloads on separate clusters can access same data Architektur of Big Data Solutions
  32. 32. A new way to Manage Big Data Big Data Traditional Assumptions Bare-metal Data Locality HDFS on local disks Big Data A New Approach Containers and VMs Compute and storage separation Shared storage Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to-insights Architektur of Big Data Solutions
  33. 33. Hadoop Clusterd Hadoop Cluster Big Data Cluster Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Parallel Processing Storage Storage RawRefined Results Architektur of Big Data Solutions Big Data & Cloud - Amazon WebServices (AWS)
  34. 34. Microservices Architecture Architektur of Big Data Solutions
  35. 35. Hadoop Clusterd Hadoop Cluster Big Data Cluster Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Parallel Processing Storage Storage RawRefined Results Microservice Cluster Microservice State { } API Stream Analytics Cluster Stream Processor State { } API Event Stream Event Stream Service Architektur of Big Data Solutions
  36. 36. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  37. 37. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  38. 38. Big Data Ecosystem – many choices sorted! Architektur of Big Data Solutions
  39. 39. Guido Schmutz Technology Manager guido.schmutz@trivadis.com Architektur of Big Data Solutions

×