Architektur von Big Data Lösungen

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Architektur von Big Data
Lösungen
Guido Schmutz (guido.schmutz@trivadis.com)
@gschmutz

Guido Schmutz
Working for Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer, Software Architect for Java, SOA & Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
2 Architektur von Big Data Lösungen

Agenda
1. Introduction
2. Big Data Reference Architectures
• Traditional Big Data
• Event / Stream-Processing
• Lambda Architecture
• Kappa Architecture
• Unified Architecture
• Microservices Architecture
3. Big Data Ecosystem – many choices sorted!

Introduction

Big Data Definition (4 Vs)
+ Time to action ? – Big Data + Real-Time = Stream Processing
Characteristics of Big Data: Its Volume, Velocity
and Variety in combination
Reliable Data Ingestion in Big Data/IoT

How to do Big Data? Why is a structuring / architecture
important?

Why talk about Big Data Architectures?
Choosing the right architecture is key for any (big data) project
Big Data is still quite a rather young field and therefore a “moving target”
no standard architectures available which have been used for years
In the past years, some architectures and best practices have evolved
Know your use cases before choosing your architecture / technologies
To have a reference architecture in place helps in choosing the
right/matching technologies

Important Properties for choosing (Big) Data Architecture
Latency
Keep raw and un-interpreted data “forever” ?
Volume, Velocity, Variety, Veracity
Ad-Hoc Query Capabilities needed ?
Robustness & Fault Tolerance
Scalability
…

Big Data Reference Architectures -
Traditional Big Data

“Traditional Architecture” for Big Data
Data
Ingestion
(Analytical) Data Processing
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Ingestion Result Store
Query
Engine
Computed
Information
Raw Data
(Reservoir)
= Data in Motion = Data at Rest
Pulling
Ingestion
Channel

“Traditional Architecture” for Big Data – Hadoop
Technology Mapping
Data
Ingestion
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Query
Engine
Computed
Information
Raw Data
(Reservoir)
Pulling
Ingestion
Channel

“Traditional Architecture” for Big Data – Spark
Technology Mapping
Data
Ingestion
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Query
Engine
Computed
Information
Raw Data
(Reservoir)
Pulling
Ingestion
Channel

“Traditional Architecture” for Big Data – Feeding in High-
Volume Event Streams
Data
Ingestion
Data
Sources
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Pushing
Query
Engine
Computed
Information
Raw Data
(Reservoir)
Pulling
Ingestion
Channel
?
?

Traditional Architecture for Big Data
• Batch Processing - “Data at Rest”
• Not for low latency use cases
• Responses are delivered “after the fact”
• Maximum value of the identified situation is lost
• Decision are made on old and stale data
• Spark Core is a faster alternative to Hadoop Map
Reduce, but still Batch Processing
• Spark Ecosystems offers a lot of additional
advanced analytic capabilities (machine learning,
graph processing, …)

Big Data Reference Architectures –
Event/Stream Processing

Event / Stream Processing – “Data in Motion”
“Data in motion”
Events are analyzed and processed in real-
time as the arrive
Decisions are timely, contextual and based
on fresh data
Decision latency is eliminated

Event / Stream Processing Architecture
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
(Analytical) Real-Time Data Processing
Stream/Event Processing
Messaging
Result Store

Challenges for Ingesting Data
Multitude of sensors
Real-Time Streaming
Multiple Firmware versions
Bad Data from damaged sensors
Regulatory Constraints
Data Quality

Continuous Data Ingestion
DB Source
Big Data
Log
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Log CDC
Native
IoT Sensor
IoT Sensor
19
Dataflow GW
Topic
Topic
Queue
Message GW
Topic
Dataflow GW
Dataflow
TopicREST
19
File Source
Log
Log
Log
Social
Native
Topic
Topic

Continuous Data Ingestion
DB Source
Big Data
Log
Stream
Processing
IoT Sensor
Event Hub
Topic
Topic
REST
Topic
IoT GW
CDC GW
Connect
CDC
DB Source
Log CDC
Native
IoT Sensor
IoT Sensor
20
Dataflow GW
Topic
Topic
Queue
Message GW
Topic
Dataflow GW
Dataflow
TopicREST
20
File Source
Log
Log
Log
Social
Native
Topic
Topic

Data
Ingestion
Event / Stream Processing Architecture – Open Source
Technology Mapping
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Messaging
Result Store

Data
Ingestion
Event / Stream Processing Architecture – Oracle
Technology Mapping
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Messaging
Result Store

Event / Stream Processing Architecture
The solution for low latency use cases
Process each event separately => low latency
Process events in micro-batches => increases latency but offers better
reliability
Previously known as “Complex Event Processing”
Keep the data moving / Data in Motion instead of Data at Rest => raw events
were not stored

Event / Stream Processing Architecture - Keep raw
event data
Data
Ingestion
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
Logfiles
Social
RDBMS
ERP
Sensor
Machine
Messaging
Result Store
(Analytical) Batch Data Processing
Raw Data
(Reservoir)

Lambda Architecture for Big Data

“Lambda Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Messaging
Result Store
Query
Engine
Result Store
Computed
Information
Raw Data
(Reservoir)
Pulling
Ingestion

Lambda Architecture for Big Data
Combines (Big) Data at Rest with (Fast) Data in Motion
Closes the gap from high-latency batch processing
Keeps the raw information forever
Makes it possible to rerun analytics operations on whole data set if necessary
=> because the old run had an error or
=> because we have found a better algorithm we want to apply
Have to implement functionality twice
• Once for batch
• Once for real-time streaming

„Kappa“ Architecture

“Kappa Architecture” for Big Data
Data
Ingestion
“Raw Data Reservoir”
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Messaging
Result Store
Raw Data
(Reservoir)
Computed
Information
Queryable State

Organizing NoSQL Data Stores – Different Types
Key Value Store
Wide-column store
Document store
Graph store
Key Value
K1 V1
K2 V2
K3 V3
Document
{
k1: v1,
k2: v2,
k3: [v1, v2, v3]
}
Rowkey
CK1
RK1
V1
CK2
V2
CK3
V3
CK4
V4
…
…
CK1
RK2
V1
CK4
V4
CK6
V6
…
…
…
…
…
…
CK3
V3

Organizing NoSQL Data Stores – and the Products
Key Value Store
Wide-column store
Document store
Graph store

„Unified“ Architecture

“Unified Architecture” for Big Data
Data
Ingestion
(Analytical) Batch Data Processing (Calculate Models of incoming data)
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Batch
compute
Messaging
Result Store
Result Store
Computed
Information
Raw Data
(Reservoir)
Prediction
Models
Queryable State

Event Driven (Micro-) Service
Architectures

MicroserviceMicroservice
MicroserviceMicroservice
Event-Driven (Micro-) Services Architecture
Data
Ingestion
“Raw Data Reservoir”
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Content
RDBMS
Social
ERP
Logfiles
Sensor
Machine
Microservice 2
Service
Raw Data
(Reservoir)
Computed
Information
State
Batch
compute
Microservice 1
Service State
API
Result Store

Big Data Ecosystem – many
choices sorted!

Building Blocks for (Big) Data Processing
Data
Acquisition
Format
File System
Stream Processing
Batch SQL
Graph DBMS
Document
DBMS
Relational
DBMS
Visualization
IoT
Messaging
Analytics
OLAP DBMS
Query
Federation
Table-Style
DBMS
Key Value
DBMS
Batch Processing
In-Memory

Big Data Ecosystem – many choices sorted!

Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com

Architektur von Big Data Lösungen

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Architektur von Big Data Lösungen

Similar to Architektur von Big Data Lösungen (20)

More from Guido Schmutz

More from Guido Schmutz (20)

Recently uploaded

Recently uploaded (20)

Architektur von Big Data Lösungen