Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data
BluePrint
Architect for change
@daangerits
#bdbp
Who am I?
@daangerits
daan@bigboards.io
Agenda
Concepts
Architecture
Examples
Concepts
TransCo
Meet TransCo - Parcel delivery service
Common interactions
A customer requesting a quote
A website visitor clicking on a link
Booking a financial transaction
A d...
TransCo
All these have a similar thing:
Events
IT
Finance
Legal
Logistics
Sales
Communications
...
Events
Events used to manipulate our
master data
Events
Today, events ARE our master data
Anatomy of an event
Timestamp
When did it
happen?
Origin
Where did it
came from?
Actor
Who did it?
Subject
Who was
affecte...
Anatomy of an event - example
2014-05-03
13:40:51
timestamp
CRM
Application
origin
Daan
Gerits
actor
Alfred
Hitchcock
subj...
Architecture
Store
View
Generator
View
Generator
Overview
Translate entities
into events and
facts.
Resolve values to
ids. Especially
s...
Ingest
S
I
T L D
V
V
Get records in from other systems
- Event Bus/Broker
- Ingestion System like Flume / Sqoop / …
- ETL ...
Translator
Convert records into events
- 1 record field = 1 fact
- record timestamp vs generated timestamp
Only store chan...
Store
Persist the events as they are
Raw Data
- Source of truth
- Recovery
Optimize Storage
- Parquet, Avro, Thrift, ...
S...
Linker
Resolve event fields
- “Daan Gerits” == id 44543-45436-9928
Optimize for speed
- Use lookup tables
- Group data if ...
Detonator
Explode a fact to multiple rollup levels
Why?
- Real-time rollups
- Running analytics
When?
- if there is an hie...
View Generator
Use facts to generate a view
A view is
- != database view
- read-only
- optimised data model for a single p...
Rules of the game
Only add and remove are allowed
Events are re-playable
Remove only be done by BDA’s (Big Data Administra...
Example
Add Customer
IN:
processing system: CRM
user: “fbaker”
data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” ...
Update Customer
IN:
processing system: ERP
user: “wvl”
data: { id: “9332-DG”, address: “container 24” }
DATA:
event ID ori...
DELETE Customer
IN:
processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
DATA:
event ID origin actor subject time...
Aaaarrgghhh!!
IN:
processing system: ERP
user: “fbaker”
data: { id: “9332-DG” }
event ID origin actor subject timestamp fa...
Allows fact trending
driver statistics for his whole career
Allows state regeneration
the state of all facts on februari 1...
We don’t hire
datascientists, architects,
developers, ux designers
or engineers.
We hire individuals
ShamelessPlug
ThankYo...
Upcoming SlideShare
Loading in …5
×

Big Data BluePrint

9,365 views

Published on

Ever wondered how you could process any kind of data you can get your hands on? This presentation outlines a blueprint for a bigdata architecture to process any data fragment as an event, allowing to slice and dice your data as you see fit.

Published in: Data & Analytics
  • Be the first to comment

Big Data BluePrint

  1. 1. Big Data BluePrint Architect for change @daangerits #bdbp
  2. 2. Who am I? @daangerits daan@bigboards.io
  3. 3. Agenda Concepts Architecture Examples
  4. 4. Concepts
  5. 5. TransCo Meet TransCo - Parcel delivery service
  6. 6. Common interactions A customer requesting a quote A website visitor clicking on a link Booking a financial transaction A delivery truck pinging its GPS coördinates
  7. 7. TransCo All these have a similar thing: Events IT Finance Legal Logistics Sales Communications ...
  8. 8. Events Events used to manipulate our master data
  9. 9. Events Today, events ARE our master data
  10. 10. Anatomy of an event Timestamp When did it happen? Origin Where did it came from? Actor Who did it? Subject Who was affected? Facts What changed? Event
  11. 11. Anatomy of an event - example 2014-05-03 13:40:51 timestamp CRM Application origin Daan Gerits actor Alfred Hitchcock subject street=”...” vat=”...” facts Event
  12. 12. Architecture
  13. 13. Store View Generator View Generator Overview Translate entities into events and facts. Resolve values to ids. Especially subject, actor and origin. Explode a single fact to multiple rollup levels. Only explode if applicable. Store the raw events so we can replay whenever we want. DetonatorLinkerTranslator Ingest View generators can perform analytical tasks on the incoming events. The generated view can be stored in a storage system of choice. S I T L D V V
  14. 14. Ingest S I T L D V V Get records in from other systems - Event Bus/Broker - Ingestion System like Flume / Sqoop / … - ETL processes (not recommended) - Backups - Nagios / Statsd / Ganglia / ...
  15. 15. Translator Convert records into events - 1 record field = 1 fact - record timestamp vs generated timestamp Only store changed facts - What changed? - Compare with existing views S I T L D V V
  16. 16. Store Persist the events as they are Raw Data - Source of truth - Recovery Optimize Storage - Parquet, Avro, Thrift, ... S I T L D V V
  17. 17. Linker Resolve event fields - “Daan Gerits” == id 44543-45436-9928 Optimize for speed - Use lookup tables - Group data if needed S I T L D V V
  18. 18. Detonator Explode a fact to multiple rollup levels Why? - Real-time rollups - Running analytics When? - if there is an hierarchy in actor or actee - if there is an hierarchy in timestamp S I T L D V V IN OUT {ts: 2014-05-19, fact: …} {ts: 2014-05-19, fact: …} {ts: 2014-05, fact: …} {ts: 2014, fact: …}
  19. 19. View Generator Use facts to generate a view A view is - != database view - read-only - optimised data model for a single purpose - disposable - based on all facts (facts depth & width) A view generator manipulates - RDBMs, graphs, search indexes, ... S I T L D V V
  20. 20. Rules of the game Only add and remove are allowed Events are re-playable Remove only be done by BDA’s (Big Data Administrators)
  21. 21. Example
  22. 22. Add Customer IN: processing system: CRM user: “fbaker” data: { id: “9332-DG”, name: ”Daan Gerits”, address: “container 9” } DATA: event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9
  23. 23. Update Customer IN: processing system: ERP user: “wvl” data: { id: “9332-DG”, address: “container 24” } DATA: event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9 39 erp wvl 9332-DG 20141109 address container 24
  24. 24. DELETE Customer IN: processing system: ERP user: “fbaker” data: { id: “9332-DG” } DATA: event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9 39 erp wvl 9332-DG 20141109 address container 24 63 erp fbaker 9332-DG 20141201 address 63 erp fbaker 9332-DG 20141201 name
  25. 25. Aaaarrgghhh!! IN: processing system: ERP user: “fbaker” data: { id: “9332-DG” } event ID origin actor subject timestamp fact value 1 crm fbaker 9332-DG 20140514 name Daan Gerits 1 crm fbaker 9332-DG 20140514 address container 9 39 erp wvl 9332-DG 20141109 address container 24 63 erp fbaker 9332-DG 20141201 address 63 erp fbaker 9332-DG 20141201 name 64 erp wvl 9332-DG 20141109 address container 24 64 crm fbaker 9332-DG 20140514 name Daan Gerits
  26. 26. Allows fact trending driver statistics for his whole career Allows state regeneration the state of all facts on februari 12, 2005 Is human-error-proof remove the facts with eventId # Scales very well Conclusion
  27. 27. We don’t hire datascientists, architects, developers, ux designers or engineers. We hire individuals ShamelessPlug ThankYou!

×