Event-Processing-und-BigData-kombiniert-guido_schmutz

Guido Schmutz | Trivadis
Event-Processing und Big
Data kombiniert, geht das?

2013 © Trivadis
BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 
2013 © Trivadis
Event-Processing und Big Data
kombiniert, geht das?
Guido Schmutz
24.02.2014
Event-Processing und Big Data kombiniert, geht das?
2

2013 © Trivadis
Guido Schmutz
Working for Trivadis for more than 16 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA and EDA
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 20 years of software development
experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz
24.02.2014
3

2013 © Trivadis
Trivadis is a market leader in IT consulting, system integration,
solution engineering and the provision of IT services focusing
on and technologies in Switzerland,
Germany and Austria.
We offer our services in the following strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
Our company
24.02.2014
O P E R A T I O N
4

2013 © Trivadis
With over 600 specialists and IT experts in your region
24.02.2014
5
12 Trivadis branches and more than
600 employees

200 Service Level Agreements

Over 4,000 training participants

Research and development budget:
CHF 5.0 / EUR 4 million

Financially self-supporting and
sustainably profitable

Experience from more than 1,900
projects per year at over 800
customers
Hamburg
Düsseldorf
Frankfurt
Freiburg
München
Wien
Basel
ZurichBern
Lausanne
Stuttgart
Brugg
5

2013 © Trivadis
AGENDA
1.  Big Data and Fast Data, what is it?
2.  Motivation
3.  The Lambda Architecture
4.  Implementing the Lambda Architecture
5.  Demo – Event Processing with Oracle OEP
6.  Summary
24.02.2014
6

2013 © Trivadis
Big Data Definition (4 Vs)
24.02.2014
+ Time to action ? – Big Data + Event
Processing = Fast Data
Characteristics of Big Data: Its Volume,
Velocity and Variety in combination
7

2013 © Trivadis
The world is changing …
The model of Generating/Consuming Data has changed ….
Old Model: few companies are generating data, all others are consuming
data
New Model: all of use are generating data, and all of us are consuming
data
24.02.2014
8

2013 © Trivadis
Who is generating Big Data?
The progress and innovation is no longer hindered by the ability to collect data
But by the ability to manage, analyze, summarize, visualize and discover
knowledge from the collected data in a timely manner and in a scalable fashion
24.02.2014
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and
networks
(measuring all kinds of data)
9

2013 © Trivadis
24.02.2014
10

2013 © Trivadis
Internet Of Things – Sensors
are/will be everywhere
There are more devices tapping into
the internet than people on earth
How do we prepare our
systems/architecture for the future?
24.02.2014
Source: CiscoSource: The Economist
11

2013 © Trivadis
Data as an Asset - Store Anything?
24.02.2014
But then data is 
just too valuable 
to delete! 
We must  
store anything!
Nonsense! Just  
store the data  
you know  
you need today!
It depends … but Big Data
technologies allow to store the
raw information from both new
data sources as well as existing
ones so that you can later use it to
create new data-driven products,
you would not have thought
about today!
12

2013 © Trivadis
Big Data vs. Traditional Enterprise Data
§  Big Data is not just “a lots more enterprise data”
§  Big Data is usually states, events, transactions etc. – not master data
§  Big Data is commonly generated outside of traditional enterprise
applications but needs to be associated with it
§  Big Data is often composed of un(evenly)structured information types
that continually arrive in enormous amounts
§  Data / Information as an Asset!
24.02.2014
13

2013 © Trivadis
AGENDA
2.  Architecting (Big) Data Systems
6.  Summary
24.02.2014
14

2013 © Trivadis
What is a data system?
•  A (data) system that manages the storage and querying of data with a
lifetime measured in years encompassing every version of the
application to ever exist, every hardware failure and every human
mistake ever made.
•  A data system answers questions based on information that was
acquired in the past
24.02.2014
15

2013 © Trivadis
How do we build (data) systems today – Today’s
Architectures
Source of Truth is mutable!
•  CRUD pattern
What is the problem with this?
•  Lack of Human Fault Tolerance
•  Potential loss of information/
data
24.02.2014
Mutable
Database
Application
(Query)
RDBMS
NoSQL
NewSQL
Mobile
Web
RIA
Rich Client
Source of Truth
Source of Truth
16

2013 © Trivadis
Problems in today’s
architecture/systems
Bugs will be deployed to production over the lifetime of a data system
Operational mistakes will be made
Humans are part of the overall system
•  Just like hard disks, CPUs, memory, software
•  design for human error like you design for any other fault
Examples of human error
•  Deploy a bug that increments counters by two instead of by one
•  Accidentally delete data from database
•  Accidental DOS on important internal service
Worst two consequences: data loss or data corruption
As long as an error doesn‘t lose or corrupt good data, you can fix what
went wrong
24.02.2014
Lack of Human Fault Tolerance
17

2013 © Trivadis
Immutability vs. Mutability
The U and D in CRUD
A mutable system updates the current
state of the world
Mutable systems inherently lack
human fault-tolerance
Easy to corrupt or lose data
An immutable system captures historical
records of events
Each event happens at a particular
time and is always true
24.02.2014
Immutability restricts the range of errors causing data loss/data corruption
Vastly more human fault-tolerant
Conclusion: Your source of truth should always be immutable
18

2013 © Trivadis
A different kind of architecture with immutable source of
truth
Instead of using our traditional approach … why not building data systems
like this
24.02.2014
HDFS
NoSQL
NewSQL
RDBMS
View on
Data
Mobile
Web
RIA
Rich Client
Source of Truth
Immutable
data
View on
Data
Application
(Query)
Source of Truth
19

2013 © Trivadis
How to create the views on the Immutable data?
On the fly ?
Materialized, i.e. Pre-computed ?
24.02.2014
Immutable
data
View
Immutable
data
Pre- 
Computed 
Views
Query
Query
20

2013 © Trivadis
Big Data Processing - Batch
24.02.2014
HDFS
Data Store optimized
for appending large
results
Queries
Stream 1
Stream 2
Event
Hadoop cluster
Map/Reduce in Pig
Hadoop Distributed File System
21

2013 © Trivadis
Big Data Processing – Batch
24.02.2014
Immutable
data
Batch
View
Query??
Incoming
Data
How to compute the batch views ?
How to compute queries from the views ?
22

2013 © Trivadis
24.02.2014
1.2.13 Add iPAD 64GB
10.3.13 Add Sony RX-100
11..3.13 Add Canon GX-10
11.3.13 Remove Sony RX-100
12.3.13 Add Nikon S-100
14.4.13 Add BoseQC-15
15.4.13 Add MacBook Pro 15
20.4.13 Remove Canon GX10
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
4derive derive
Favorite Product List Changes
Current Favorite  
Product List
Current
Product
Count
Raw information => data
Information => derived
23

2013 © Trivadis
24.02.2014
§  Using only batch processing, leaves you always with a portion of non-
processed data.
Fully processed data Last full
batch period
Time for 
batch job
time
now
non-processed data
time
now
batch-processed data
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
But we are not done yet …
24

2013 © Trivadis
Big Data Processing - Adding Real-Time
24.02.2014
Immutable
data
Batch
Views
Query
?
Data
Stream
Realtime
Views
Incoming
Data
How to compute queries  
from the views ?How to compute real-time views
25

2013 © Trivadis
Big Data Processing - Adding Real-Time
24.02.2014
1.2.13 Add iPAD 64GB
10.3.13 Add Sony RX-100
11..3.13 Add Canon GX-10
11.3.13 Remove Sony RX-100
12.3.13 Add Nikon S-100
14.4.13 Add BoseQC-15
15.4.13 Add MacBook Pro 15
20.4.13 Remove Canon GX10
Now Add Canon Scanner
iPAD 64GB
Nikon S-100
BoseQC-15
MacBook Pro 15
5
compute
Current Favorite  
Product List
Current
Product
Count
Now Canon ScannercomputeAdd Canon Scanner
Stream of
Immutable data
Views
Data Stream
Query
incoming
26

2013 © Trivadis
Big Data Processing -
Batch & Real Time
24.02.2014
time
Fully processed data Last full
batch period
now
Time for 
batch job
batch processing 
worked fine here
(e.g. Hadoop)
real time processing 
works here
blended view for end user
Adapted from Ted Dunning (March 2012):
http://www.youtube.com/watch?v=7PcmbI5aC20
27

2013 © Trivadis
AGENDA
6.  Summary
24.02.2014
28

2013 © Trivadis
Lambda Architecture
24.02.2014
Immutable
data
Batch
View
Query
Data
Stream
Realtime
View
Incoming
Data
Serving Layer
Speed Layer
Batch Layer
A
B
C D
E
F
G
29

2013 © Trivadis
Lambda Architecture
A.  All data is sent to both the batch and speed layer
B.  Master data set is an immutable, append-only set of data
C.  Batch layer pre-computes query functions from scratch, result is called Batch
Views. Batch layer constantly re-computes the batch views.
D.  Batch views are indexed and stored in a scalable database to get particular
values very quickly. Swaps in new batch views when they are available
E.  Speed layer compensates for the high latency of updates to the Batch Views
F.  Uses fast incremental algorithms and read/write databases to produce real-
time views
G.  Queries are resolved by getting results from both batch and real-time views
24.02.2014
30

2013 © Trivadis
Lambda Architecture
24.02.2014
Stores the immutable constantly growing dataset
Computes arbitrary views from this dataset using BigData
technologies (can take hours)
Can be always recreated
Computes the views from the constant stream of data it receives
Needed to compensate for the high latency of the batch layer
Incremental model and views are transient
Responsible for indexing and exposing the pre-computed batch
views so that they can be queried
Exposes the incremented real-time views
Merges the batch and the real-time views into a consistent result
Serving Layer
Batch Layer
Speed Layer
31

2013 © Trivadis
AGENDA
6.  Summary
24.02.2014
32

2013 © Trivadis
Lambda Architecture
24.02.2014
Speed Layer
Precompute
Views
query
Source: Marz, N. & Warren, J. (2013) Big Data. Manning.
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Incoming
Data
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
Merge
33

2013 © Trivadis
Lambda Architecture in Action
24.02.2014
Implementation in ongoing Proof-of-concept (after completion of phase 1)
Speed Layer
Precompute
Views
query
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Incoming
Data
Batch
recompute
Realtime
increment
Serving Layer
batch view
batch view
real time view
real time view
Merge
34

2013 © Trivadis
Lambda Architecture with Oracle Product Stack
Possible implementation with Oracle Product stack
24.02.2014
Speed Layer
Precompute
Views
query
Batch Layer
Precomputed
information
All data
Incremented
information
Process stream
Incoming
Data
Batch
recompute
Serving Layer
batch view
batch view
real time view
real time view
Merge
Oracle NoSQL
Oracle RDBMS
Oracle Coherence
Oracle BigData Appliance
Oracle NoSQL
Oracle Coherence
Oracle Event Processing
Oracle GoldenGate
Oracle Data Integrator
Oracle GoldenGate
Oracle Event Processing
Oracle Service Bus
OracleWebLogicServerOracleADF
OBIEEOracleEndeca
OracleBigData 
Connectors
BAM
35

2013 © Trivadis
AGENDA
6.  Summary
24.02.2014
36

2013 © Trivadis
Retrieve Tweets and Visualize
24.02.2014
37

2013 © Trivadis
Access to Tweets
24.02.2014
Quelle
Source Limitations Cost
Twitter’s Search API 3200 / user
5000 / keyword
180 requests / 15 minutes
free
Twitter’s Streaming API 1%-40% of total volume free
DataSift
none
0.15 -0.20$ /
unit
Gnip none On request
38

2013 © Trivadis
1) Creating a Twitter Adapter
24.02.2014
Twitter
Adapter
Only 3 minutes remaining in
the gold medalgame,
@HC_Men with a
commanding 3-0 lead.
#CANvsSWE #TeamCanada
#Sochi2014
Twitter
39

2013 © Trivadis
2) Send Tweets to BAM
24.02.2014
Twitter
Adapter
BAM
Tweet
Only 3 minutes remaining in the gold medalgame,
@HC_Men with a commanding 3-0 lead. #CANvsSWE
#TeamCanada #Sochi2014
the gold medalgame,
@HC_Men with a
#Sochi2014
JMS
Twitter
40

2013 © Trivadis
3) Extract interesting information from Tweet
24.02.2014
Mention
Extractor
Twitter
Adapter
Hashtag 
Extractor
Author
Extractor
BAM
Tweet
the gold medalgame,
@HC_Men with a
#Sochi2014
@hc_men
hockeycanada
#canvsswe
#teamcanada
JMS
Twitter
#sochi2014
41

2013 © Trivadis
4) Count occurrences within period
24.02.2014
Mention
Extractor
Twitter
Adapter
Counter 
Processor
Hashtag 
Extractor
Author
Extractor
BAM
Tweet
BAM
CounterOnly 3 minutes remaining in
the gold medalgame,
@HC_Men with a
#Sochi2014
#canvsswe,5
#sochi2014,9
hockeycanada,1
@hc_men,1
#teamcanada,5
JMS
JMS
Twitter
range 30 seconds 
slide 30 seconds
@hc_men
hockeycanada
#canvsswe
#teamcanada
#sochi2014
42

2013 © Trivadis
Implementing in Oracle Event Processing
24.02.2014
Mention
Extractor
Twitter
Adapter
Counter 
Processor
Hashtag 
Extractor
Author
Extractor
BAM
Tweet
BAM
Counter
JMS
JMS
Twitter
range 30 seconds 
slide 30 seconds
43

2013 © Trivadis
Oracle BAM: Architected for Integration and
Visualization
Internet
BAM Dashboards
WebApplications
StartPage
ActiveViewer
ActiveStudio
Architect
Administrator
ReportServer
iCommand
Oracle Database
(Grid)
BAM Data &
Metadata
External Data Objects
WebServices
Internet
Enterprise
Integration
Framework
Application Server
BI
Web Services
JMS Connector
BAM Adapter
ADF
BAM DataControl
ADF Pages with DVT
BAM ServerEventEngine
Actions & Escalations
Notification Services
ReportCache
Snapshots &
Change Lists
Memory / Disk
ActiveDataCache
ViewSets
API
Kernel
DataSets
DataStorageEngine
ODI
Databases
OLTP &
Data Warehouses
Mobile Devices
Data & Metadata
Import & Export
BPEL
BPM
Message
Queues
CEP
OESB
24.02.2014
56

2013 © Trivadis
5) Adding Cassandra NoSQL for storing results
24.02.2014
Mention
Extractor
Twitter
Adapter
Counter 
Processor
Hashtag 
Extractor
Author
Extractor
Cassandra
Counter
BAM
Tweet
Cassandra
Tweet
BAM
Counter
the gold medalgame,
@HC_Men with a
#Sochi2014
JMS
JMS
Twitter
range 30 seconds 
slide 30 seconds
#canvsswe,5
#sochi2014,9
hockeycanada,1
@hc_men,1
#teamcanada,5
@hc_men
hockeycanada
#canvsswe
#teamcanada
#sochi2014
59

2013 © Trivadis
Summary – The lambda architecture
§  The Lambda Architecture
§  Can discard batch views and real-time views and recreate everything from
scratch
§  Mistakes corrected via re-computation
§  Data storage layer optimized independently from query resolution layer
§  Still in a very early …. But a very interesting idea!
-  Today a zoo of technologies are needed => Operations won‘t like it
§  The technology/implementation
§  Different query language for batch and real time
§  An abstraction over batch and speed layer needed
-  Cascading and Trident are already similar
§  Not everything works out-of-the-box and together
§  Industry standards needed!
24.02.2014
61

2013 © Trivadis
Questions and answers ...
2013 © Trivadis
BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
24.02.2014
62

Event-Processing-und-BigData-kombiniert-guido_schmutz

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Event-Processing-und-BigData-kombiniert-guido_schmutz

Similar to Event-Processing-und-BigData-kombiniert-guido_schmutz (20)

More from Trivadis

More from Trivadis (20)

Recently uploaded

Recently uploaded (20)

Event-Processing-und-BigData-kombiniert-guido_schmutz