Ever wondered how you could process any kind of data you can get your hands on? This presentation outlines a blueprint for a bigdata architecture to process any data fragment as an event, allowing to slice and dice your data as you see fit.
6. Common interactions
A customer requesting a quote
A website visitor clicking on a link
Booking a financial transaction
A delivery truck pinging its GPS
coördinates
7. TransCo
All these have a similar thing:
Events
IT
Finance
Legal
Logistics
Sales
Communications
...
10. Anatomy of an event
Timestamp
When did it
happen?
Origin
Where did it
came from?
Actor
Who did it?
Subject
Who was
affected?
Facts
What
changed?
Event
11. Anatomy of an event - example
2014-05-03
13:40:51
timestamp
CRM
Application
origin
Daan
Gerits
actor
Alfred
Hitchcock
subject
street=”...”
vat=”...”
facts
Event
13. Store
View
Generator
View
Generator
Overview
Translate entities
into events and
facts.
Resolve values to
ids. Especially
subject, actor and
origin.
Explode a single
fact to multiple
rollup levels. Only
explode if
applicable.
Store the raw
events so we can
replay whenever
we want.
DetonatorLinkerTranslator
Ingest View generators
can perform
analytical tasks on
the incoming
events.
The generated
view can be
stored in a storage
system of choice.
S
I
T L D
V
V
14. Ingest
S
I
T L D
V
V
Get records in from other systems
- Event Bus/Broker
- Ingestion System like Flume / Sqoop / …
- ETL processes (not recommended)
- Backups
- Nagios / Statsd / Ganglia / ...
15. Translator
Convert records into events
- 1 record field = 1 fact
- record timestamp vs generated timestamp
Only store changed facts
- What changed?
- Compare with existing views
S
I
T L D
V
V
16. Store
Persist the events as they are
Raw Data
- Source of truth
- Recovery
Optimize Storage
- Parquet, Avro, Thrift, ...
S
I
T L D
V
V
17. Linker
Resolve event fields
- “Daan Gerits” == id 44543-45436-9928
Optimize for speed
- Use lookup tables
- Group data if needed
S
I
T L D
V
V
18. Detonator
Explode a fact to multiple rollup levels
Why?
- Real-time rollups
- Running analytics
When?
- if there is an hierarchy in actor or actee
- if there is an hierarchy in timestamp
S
I
T L D
V
V
IN OUT
{ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …}
{ts: 2014-05, fact: …}
{ts: 2014, fact: …}
19. View Generator
Use facts to generate a view
A view is
- != database view
- read-only
- optimised data model for a single purpose
- disposable
- based on all facts (facts depth & width)
A view generator manipulates
- RDBMs, graphs, search indexes, ...
S
I
T L D
V
V
20. Rules of the game
Only add and remove are allowed
Events are re-playable
Remove only be done by BDA’s (Big Data Administrators)
26. Allows fact trending
driver statistics for his whole career
Allows state regeneration
the state of all facts on februari 12, 2005
Is human-error-proof
remove the facts with eventId #
Scales very well
Conclusion
27. We don’t hire
datascientists, architects,
developers, ux designers
or engineers.
We hire individuals
ShamelessPlug
ThankYou!