SlideShare a Scribd company logo
1 of 49
Download to read offline
Guido Schmutz | Trivadis
Architektur von Big Data Lösungen
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Architektur von Big Data Lösungen
Guido Schmutz
Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz
Agenda
1. Introduction
2. Traditional Architecture for Big Data
3. Streaming Analytics Architecture for Fast Data
4. Lambda/Kappa/Unifed Architecture for Big Data
5. Summary
Introduction
How to do Big Data?
Big Data is still “work in progress”
Choosing the right architecture is key for any (big data) project
Big Data is still quite a young field and therefore there are no standard architectures
available which have been used for years
In the past few years, a few architectures have evolved and have been discussed online
Know the use cases before choosing your architecture
To have one/a few reference architectures can help in choosing the right components
Building Blocks for (Big) Data Processing
Data
Acquisition
Format
File System
Stream Processing
Batch SQL
Graph DBMS
Document
DBMS
Relational
DBMS
Visualization
IoT
Messaging
Analytics
OLAP DBMS
Query
Federation
Table-Style
DBMS
Key Value
DBMS
Batch Processing
In-Memory
Big Data Ecosystem – many choices ….
Important Properties to choose a Big Data Architecture
Latency
Keep raw and un-interpreted data “forever” ?
Volume, Velocity, Variety, Veracity
Ad-Hoc Query Capabilities needed ?
Robustness & Fault Tolerance
Scalability
…
From Volume and Variety to Velocity
Big Data has evolved …
and the Hadoop Ecosystem as well ….
Past
Big	Data =	Volume	&	Variety
Present
Big	Data =	Volume	&	Variety	&	Velocity
Past
Batch	Processing
Time	to	insight	of	Hours
Present
Batch &	Stream	Processing
Time	to	insight in	Seconds
Adapted	from	Cloudera	blog	article
Traditional Architecture for Big
Data
“Traditional Architecture” for Big Data
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
Batch
compute
Stage
Result	Store
Query
Engine
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Use Case 1) – Log File Analysis (Audit, Compliance,
Root Cause Analysis)
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Data
Consumer
Channel
Batch
compute
Computed	
Information
Raw	Data	
(Reservoir)
Result	Store
Query
Engine
Reports
Analytic
Tools
Logfiles
Use Case 1a) – Log File Analysis (Audit, Compliance,
Root Cause Analysis)
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Data
Consumer
Channel
Batch
compute
Computed	
Information
Raw	Data	
(Reservoir)
Result	Store
Query
Engine
Reports
Analytic
Tools
Logfiles
Use Case 2) – Customer Data Hub (360 degree view of
customer)
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Data
Consumer
RDBMS
Batch
compute
Computed	
Information
Raw	Data	
(Reservoir)
Result	Store
Query
Engine
Reports
Service
Analytic
Tools
Alerting
Tools
Use Case 2a) – Customer Data Hub (360 degree view
of customer)
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Data
Consumer
RDBMS
(CDC) Batch
compute
Computed	
Information
Raw	Data	
(Reservoir)
Result	Store
Query
Engine
Reports
Service
Analytic
Tools
Alerting
Tools
Channel
“Hadoop Ecosystem” Technology Mapping
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
Batch
compute
Staging
Result	Store
Query
Engine
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Apache Spark – the new kid on the block
Apache Spark is a fast and general engine for large-scale data processing
• The hot trend in Big Data!
• Originally developed 2009 in UC Berkley’s AMPLab
• Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x
faster on disk
• One of the largest OSS communities in big data with over 200 contributors in 50+
organizations
• Open Sourced in 2010 – since 2014 part of Apache Software foundation
• Supported by many vendors
Motivation – Why Apache Spark?
Apache Hadoop MapReduce: Data Sharing on Disk
Apache Spark: Speed up processing by using Memory instead of Disks
map reduce . . .
Input
HDFS
read
HDFS
write
HDFS
read
HDFS
write
op1 op2
. . .
Input
Output
Output
Apache Spark “Ecosystem”
Spark	SQL
(Batch	Processing)
Blink	DB
(Approximate
Querying)
Spark	Streaming
(Real-Time)
MLlib,	Spark	R
(Machine	Learning)
GraphX
(Graph	Processing)
Spark	Core	API	and	Execution	Model
Spark
Standalone
MESOS YARN HDFS
Elastic
Search
NoSQL S3
Libraries
Core	Runtime
Cluster	Resource	Managers Data		/	Data	Stores
“Spark Ecosystem” Technology Mapping
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
Batch
compute
Stage
Result	Store
Query
Engine
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Use Case 3) – (Batch) Recommendations on Historic
Data
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Data
Consumer
Machine
Batch
compute
Computed	
Information
Raw	Data	
(Reservoir)
Result	Store
Query
Engine
Reports
Service
Analytic
Tools
Alerting
Tools
DB
StagingFile
Channel
Traditional Architecture for Big Data
• Batch Processing
• Not for low latency use cases
• Spark can speed up, but if positioned as alternative to Hadoop
Map/Reduce, it’s still Batch Processing
• Spark Ecosystems offers a lot of additional advanced analytic capabilities
(machine learning, graph processing, …)
Streaming Analytics Architecture
for Big Data
Streaming Analytics Architecture for Big Data
aka. (Complex) Event Processing)
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
=	Data	in	Motion =	Data	at	Rest
Use Case 4) Algorithmic Trading
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Analytic
Tools
Alerting
Tools
Sensor
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
Use Case 5) Real-Time Analytics / Dashboard
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Analytic
Tools
Sensor
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
Native Streaming vs.
Micro-Batching
Native Streaming
• Events processed as they
arrive
• + low-latency
• - throughput
• - fault tolerance is expensive
Micro-Batching
• Splits incoming stream in
small batches
• + high(er) throughput
• + easier fault tolerance
• - lower latency
Source:	 Distributed	 Real-Time	Stream	Processing:	
Why	and	How	by	Petr	Zapletal
Unified Log (Event) Processing
Stream processing allows for
computing feeds off of other feeds
Derived feeds are no
different than
original feeds
they are computed off
Single deployment of “Unified
Log” but logically different feeds
Meter
Readings
Collector
Enrich	/	
Transform
Aggregate	by	
Minute
RawMeter
Readings
Meter	with
Customer
Meter	by Customer	by
Minute
Customer
Aggregate	by	
Minute
Meter	by Minute
Persist
Meter	by	
Minute
Persist
Raw	Meter
Readings
Streaming Analytics Technology Mapping
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
=	Data	in	Motion =	Data	at	Rest
Streaming Analytics Architecture for Big Data
The solution for low latency use cases
Process each event separately => low latency
Process events in micro-batches => increases latency but offers better
reliability
Previously known as “Complex Event Processing”
Keep the data moving / Data in Motion instead of Data at Rest => raw events
were not stored
Keep raw event data
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
=	Data	in	Motion =	Data	at	Rest
(Analytical)	Batch	Data	Processing
Raw	Data	
(Reservoir)
Lambda Architecture for Big Data
“Lambda Architecture” for Big Data
Data
Collection
(Analytical)	Batch	Data	Processing
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Use Case 6) Social Media Analysis
Data
Collection
(Analytical)	Batch	Data	Processing
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Use Case 7) Customer Event Hub
Data
Collection
(Analytical)	Batch	Data	Processing
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Social
RDBMS	
(CDC)
Sensor
ERP
Logfiles
Lambda Architecture for Big Data
Combines (Big) Data at Rest with (Fast) Data in Motion
Closes the gap from high-latency batch processing
Keeps the raw information forever
Makes it possible to rerun analytics operations on whole data set if necessary
=> because the old run had an error or
=> because we have found a better algorithm we want to apply
Have to implement functionality twice
• Once for batch
• Once for real-time streaming
„Kappa“ Architecture for Big Data
“Kappa Architecture” for Big Data
Data
Collection
“Raw	Data	Reservoir”
Batch
compute
Data
Sources
Messaging
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Computed	
Information
Use Case 8) Social Network Analysis
Data
Collection
“Raw	Data	Reservoir”
Batch
compute
Data
Sources
Messaging
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Computed	
Information
Kappa Architecture for Big Data
Today the stream processing infrastructure are as scalable as Big Data
processing architectures
• Some using the same base infrastructure, i.e. Hadoop YARN
Only implement processing / analytics logic once
Can Replay historical events out of an historical (raw) event store
• Provided by either the Messaging or Raw Data (Reservoir) component
Updates of processing logic / Event replay are handled by deploying new
version of logic in parallel to old one
• New logic will reprocess events until it caught up with the current events and then
the old version can be de-commissioned.
„Unified“ Architecture for Big Data
“Unified Architecture” for Big Data
Data
Collection
(Analytical)	Batch	Data	Processing	(Calculate	
Models	of	incoming	data)
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Prediction	
Models
Use Case 8) Industry 4.0 => Predictive Maintainance
Data
Collection
(Analytical)	Batch	Data	Processing	(Calculate	
Models	of	incoming	data)
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
Prediction	
Models
J-PMML
Summary
Summary
Know your use cases and then choose your architecture and the relevant
components/products/frameworks
You don’t have to use all the components of the Hadoop Ecosystem to be successful
Big Data is still quite a young field and therefore there are no standard architectures
available which have been used for years
Lambda, Kappa Architecture & Unified Architecture are best practices architectures
which you have to adapt to your environment
=> Combination of traditional Big Data Batch & Complex Event Processing (CEP)
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com

More Related Content

What's hot

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoTJim Haughwout
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Timo Walther
 
Building event-driven (Micro)Services with Apache Kafka Ecosystem
Building event-driven (Micro)Services with Apache Kafka EcosystemBuilding event-driven (Micro)Services with Apache Kafka Ecosystem
Building event-driven (Micro)Services with Apache Kafka EcosystemGuido Schmutz
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksMatthias Niehoff
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaDataStax Academy
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformDataStax Academy
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientistsJenn Rawlins
 

What's hot (20)

Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Building event-driven (Micro)Services with Apache Kafka Ecosystem
Building event-driven (Micro)Services with Apache Kafka EcosystemBuilding event-driven (Micro)Services with Apache Kafka Ecosystem
Building event-driven (Micro)Services with Apache Kafka Ecosystem
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Kafka for data scientists
Kafka for data scientistsKafka for data scientists
Kafka for data scientists
 

Viewers also liked

Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataGuido Schmutz
 
Storyboard for Media Short Film
Storyboard for Media Short FilmStoryboard for Media Short Film
Storyboard for Media Short Films0016845
 
Muestra fotográfica rostros
Muestra fotográfica rostrosMuestra fotográfica rostros
Muestra fotográfica rostrosCorneliaSL
 
informatica educativa
informatica educativainformatica educativa
informatica educativahemerita
 
Agenda castellano Febrero-Marzo 2013
Agenda castellano Febrero-Marzo 2013Agenda castellano Febrero-Marzo 2013
Agenda castellano Febrero-Marzo 2013La-Comarca
 
Proyecto de mercado en el Pósito de la Corredera
Proyecto de mercado en el Pósito de la CorrederaProyecto de mercado en el Pósito de la Corredera
Proyecto de mercado en el Pósito de la Correderacordopolis
 
100749 nguyen hoang phuong cac
100749   nguyen hoang phuong cac100749   nguyen hoang phuong cac
100749 nguyen hoang phuong cacLan Nguyễn
 
Nota conceputal y agenda. taller de medición de equidad con ops en uruguay
Nota conceputal y agenda. taller de medición de equidad con ops en uruguayNota conceputal y agenda. taller de medición de equidad con ops en uruguay
Nota conceputal y agenda. taller de medición de equidad con ops en uruguayEUROsociAL II
 
ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...
ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...
ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...Dra. Camila Hamdan
 
Compra de servicios especializados (27 04 2011)
Compra de servicios especializados (27 04 2011)Compra de servicios especializados (27 04 2011)
Compra de servicios especializados (27 04 2011)SSMN
 
Design Strategy
Design Strategy Design Strategy
Design Strategy Liya James
 
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichReal Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Rohit Talwar The Future for Airline Retail - ARC London 30 June 2011 handout
Rohit Talwar   The Future for Airline Retail - ARC London 30 June 2011 handoutRohit Talwar   The Future for Airline Retail - ARC London 30 June 2011 handout
Rohit Talwar The Future for Airline Retail - ARC London 30 June 2011 handoutRohit Talwar
 

Viewers also liked (20)

Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
 
Internet of Things (IoT) and Big Data
Internet of Things (IoT) and Big DataInternet of Things (IoT) and Big Data
Internet of Things (IoT) and Big Data
 
Storyboard for Media Short Film
Storyboard for Media Short FilmStoryboard for Media Short Film
Storyboard for Media Short Film
 
Muestra fotográfica rostros
Muestra fotográfica rostrosMuestra fotográfica rostros
Muestra fotográfica rostros
 
informatica educativa
informatica educativainformatica educativa
informatica educativa
 
Agenda castellano Febrero-Marzo 2013
Agenda castellano Febrero-Marzo 2013Agenda castellano Febrero-Marzo 2013
Agenda castellano Febrero-Marzo 2013
 
Proyecto de mercado en el Pósito de la Corredera
Proyecto de mercado en el Pósito de la CorrederaProyecto de mercado en el Pósito de la Corredera
Proyecto de mercado en el Pósito de la Corredera
 
100749 nguyen hoang phuong cac
100749   nguyen hoang phuong cac100749   nguyen hoang phuong cac
100749 nguyen hoang phuong cac
 
Nota conceputal y agenda. taller de medición de equidad con ops en uruguay
Nota conceputal y agenda. taller de medición de equidad con ops en uruguayNota conceputal y agenda. taller de medición de equidad con ops en uruguay
Nota conceputal y agenda. taller de medición de equidad con ops en uruguay
 
Ttc 2014 no 234 (fitur)
Ttc 2014 no 234 (fitur)Ttc 2014 no 234 (fitur)
Ttc 2014 no 234 (fitur)
 
Q4FY14Eng
Q4FY14EngQ4FY14Eng
Q4FY14Eng
 
ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...
ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...
ECO TREM (Apresentação do Grupo 2 em Inovação e Tecnologia de Transporte)_mat...
 
Compra de servicios especializados (27 04 2011)
Compra de servicios especializados (27 04 2011)Compra de servicios especializados (27 04 2011)
Compra de servicios especializados (27 04 2011)
 
Dogama de la genetica
Dogama de la geneticaDogama de la genetica
Dogama de la genetica
 
Design Strategy
Design Strategy Design Strategy
Design Strategy
 
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day MunichReal Time Analytics with Apache Cassandra - Cassandra Day Munich
Real Time Analytics with Apache Cassandra - Cassandra Day Munich
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Rohit Talwar The Future for Airline Retail - ARC London 30 June 2011 handout
Rohit Talwar   The Future for Airline Retail - ARC London 30 June 2011 handoutRohit Talwar   The Future for Airline Retail - ARC London 30 June 2011 handout
Rohit Talwar The Future for Airline Retail - ARC London 30 June 2011 handout
 
Spyware
SpywareSpyware
Spyware
 

Similar to Big Data Architectures @ JAX / BigDataCon 2016

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data LösungenGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsGuido Schmutz
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingGuido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environmentDataWorks Summit
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesBigData_Europe
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsArcadia Data
 
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16AppDynamics
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsStephan Reimann
 

Similar to Big Data Architectures @ JAX / BigDataCon 2016 (20)

Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environment
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Recently uploaded

原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1sinhaabhiyanshu
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 

Recently uploaded (20)

原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 

Big Data Architectures @ JAX / BigDataCon 2016

  • 1. Guido Schmutz | Trivadis Architektur von Big Data Lösungen
  • 2. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Architektur von Big Data Lösungen Guido Schmutz
  • 3. Guido Schmutz Working for Trivadis for more than 19 years Oracle ACE Director for Fusion Middleware and SOA Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Member of Trivadis Architecture Board Technology Manager @ Trivadis More than 25 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Twitter: gschmutz
  • 4. Agenda 1. Introduction 2. Traditional Architecture for Big Data 3. Streaming Analytics Architecture for Fast Data 4. Lambda/Kappa/Unifed Architecture for Big Data 5. Summary
  • 6. How to do Big Data?
  • 7. Big Data is still “work in progress” Choosing the right architecture is key for any (big data) project Big Data is still quite a young field and therefore there are no standard architectures available which have been used for years In the past few years, a few architectures have evolved and have been discussed online Know the use cases before choosing your architecture To have one/a few reference architectures can help in choosing the right components
  • 8. Building Blocks for (Big) Data Processing Data Acquisition Format File System Stream Processing Batch SQL Graph DBMS Document DBMS Relational DBMS Visualization IoT Messaging Analytics OLAP DBMS Query Federation Table-Style DBMS Key Value DBMS Batch Processing In-Memory
  • 9. Big Data Ecosystem – many choices ….
  • 10. Important Properties to choose a Big Data Architecture Latency Keep raw and un-interpreted data “forever” ? Volume, Velocity, Variety, Veracity Ad-Hoc Query Capabilities needed ? Robustness & Fault Tolerance Scalability …
  • 11. From Volume and Variety to Velocity Big Data has evolved … and the Hadoop Ecosystem as well …. Past Big Data = Volume & Variety Present Big Data = Volume & Variety & Velocity Past Batch Processing Time to insight of Hours Present Batch & Stream Processing Time to insight in Seconds Adapted from Cloudera blog article
  • 13. “Traditional Architecture” for Big Data Data Collection (Analytical) Data Processing Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine Batch compute Stage Result Store Query Engine Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 14. Use Case 1) – Log File Analysis (Audit, Compliance, Root Cause Analysis) Data Collection (Analytical) Data Processing Result StoreData Sources Data Consumer Channel Batch compute Computed Information Raw Data (Reservoir) Result Store Query Engine Reports Analytic Tools Logfiles
  • 15. Use Case 1a) – Log File Analysis (Audit, Compliance, Root Cause Analysis) Data Collection (Analytical) Data Processing Result StoreData Sources Data Consumer Channel Batch compute Computed Information Raw Data (Reservoir) Result Store Query Engine Reports Analytic Tools Logfiles
  • 16. Use Case 2) – Customer Data Hub (360 degree view of customer) Data Collection (Analytical) Data Processing Result StoreData Sources Data Consumer RDBMS Batch compute Computed Information Raw Data (Reservoir) Result Store Query Engine Reports Service Analytic Tools Alerting Tools
  • 17. Use Case 2a) – Customer Data Hub (360 degree view of customer) Data Collection (Analytical) Data Processing Result StoreData Sources Data Consumer RDBMS (CDC) Batch compute Computed Information Raw Data (Reservoir) Result Store Query Engine Reports Service Analytic Tools Alerting Tools Channel
  • 18. “Hadoop Ecosystem” Technology Mapping Data Collection (Analytical) Data Processing Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine Batch compute Staging Result Store Query Engine Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 19. Apache Spark – the new kid on the block Apache Spark is a fast and general engine for large-scale data processing • The hot trend in Big Data! • Originally developed 2009 in UC Berkley’s AMPLab • Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk • One of the largest OSS communities in big data with over 200 contributors in 50+ organizations • Open Sourced in 2010 – since 2014 part of Apache Software foundation • Supported by many vendors
  • 20. Motivation – Why Apache Spark? Apache Hadoop MapReduce: Data Sharing on Disk Apache Spark: Speed up processing by using Memory instead of Disks map reduce . . . Input HDFS read HDFS write HDFS read HDFS write op1 op2 . . . Input Output Output
  • 22. “Spark Ecosystem” Technology Mapping Data Collection (Analytical) Data Processing Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine Batch compute Stage Result Store Query Engine Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 23. Use Case 3) – (Batch) Recommendations on Historic Data Data Collection (Analytical) Data Processing Result StoreData Sources Data Consumer Machine Batch compute Computed Information Raw Data (Reservoir) Result Store Query Engine Reports Service Analytic Tools Alerting Tools DB StagingFile Channel
  • 24. Traditional Architecture for Big Data • Batch Processing • Not for low latency use cases • Spark can speed up, but if positioned as alternative to Hadoop Map/Reduce, it’s still Batch Processing • Spark Ecosystems offers a lot of additional advanced analytic capabilities (machine learning, graph processing, …)
  • 26. Streaming Analytics Architecture for Big Data aka. (Complex) Event Processing) Data Collection Batch compute Data Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store = Data in Motion = Data at Rest
  • 27. Use Case 4) Algorithmic Trading Data Collection Batch compute Data Sources Channel Data Consumer Analytic Tools Alerting Tools Sensor Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store
  • 28. Use Case 5) Real-Time Analytics / Dashboard Data Collection Batch compute Data Sources Channel Data Consumer Analytic Tools Sensor Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store
  • 29. Native Streaming vs. Micro-Batching Native Streaming • Events processed as they arrive • + low-latency • - throughput • - fault tolerance is expensive Micro-Batching • Splits incoming stream in small batches • + high(er) throughput • + easier fault tolerance • - lower latency Source: Distributed Real-Time Stream Processing: Why and How by Petr Zapletal
  • 30. Unified Log (Event) Processing Stream processing allows for computing feeds off of other feeds Derived feeds are no different than original feeds they are computed off Single deployment of “Unified Log” but logically different feeds Meter Readings Collector Enrich / Transform Aggregate by Minute RawMeter Readings Meter with Customer Meter by Customer by Minute Customer Aggregate by Minute Meter by Minute Persist Meter by Minute Persist Raw Meter Readings
  • 31. Streaming Analytics Technology Mapping Data Collection Batch compute Data Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store = Data in Motion = Data at Rest
  • 32. Streaming Analytics Architecture for Big Data The solution for low latency use cases Process each event separately => low latency Process events in micro-batches => increases latency but offers better reliability Previously known as “Complex Event Processing” Keep the data moving / Data in Motion instead of Data at Rest => raw events were not stored
  • 33. Keep raw event data Data Collection Batch compute Data Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store = Data in Motion = Data at Rest (Analytical) Batch Data Processing Raw Data (Reservoir)
  • 35. “Lambda Architecture” for Big Data Data Collection (Analytical) Batch Data Processing Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 36. Use Case 6) Social Media Analysis Data Collection (Analytical) Batch Data Processing Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 37. Use Case 7) Customer Event Hub Data Collection (Analytical) Batch Data Processing Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest Social RDBMS (CDC) Sensor ERP Logfiles
  • 38. Lambda Architecture for Big Data Combines (Big) Data at Rest with (Fast) Data in Motion Closes the gap from high-latency batch processing Keeps the raw information forever Makes it possible to rerun analytics operations on whole data set if necessary => because the old run had an error or => because we have found a better algorithm we want to apply Have to implement functionality twice • Once for batch • Once for real-time streaming
  • 40. “Kappa Architecture” for Big Data Data Collection “Raw Data Reservoir” Batch compute Data Sources Messaging Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store Raw Data (Reservoir) = Data in Motion = Data at Rest Computed Information
  • 41. Use Case 8) Social Network Analysis Data Collection “Raw Data Reservoir” Batch compute Data Sources Messaging Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store Raw Data (Reservoir) = Data in Motion = Data at Rest Computed Information
  • 42. Kappa Architecture for Big Data Today the stream processing infrastructure are as scalable as Big Data processing architectures • Some using the same base infrastructure, i.e. Hadoop YARN Only implement processing / analytics logic once Can Replay historical events out of an historical (raw) event store • Provided by either the Messaging or Raw Data (Reservoir) component Updates of processing logic / Event replay are handled by deploying new version of logic in parallel to old one • New logic will reprocess events until it caught up with the current events and then the old version can be de-commissioned.
  • 44. “Unified Architecture” for Big Data Data Collection (Analytical) Batch Data Processing (Calculate Models of incoming data) Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest Prediction Models
  • 45. Use Case 8) Industry 4.0 => Predictive Maintainance Data Collection (Analytical) Batch Data Processing (Calculate Models of incoming data) Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools RDBMS Sensor ERP Logfiles Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) Prediction Models J-PMML
  • 47. Summary Know your use cases and then choose your architecture and the relevant components/products/frameworks You don’t have to use all the components of the Hadoop Ecosystem to be successful Big Data is still quite a young field and therefore there are no standard architectures available which have been used for years Lambda, Kappa Architecture & Unified Architecture are best practices architectures which you have to adapt to your environment => Combination of traditional Big Data Batch & Complex Event Processing (CEP)
  • 48.