Your SlideShare is downloading. ×
0
Opening data within
organisations
#csvconf 2014 - Berlin - @stevenbeeckman
hi
I’m @stevenbeeckman - a digital dj!
mixcloud.com/gehorschade.kollektiv
Conductor for
StartupBus Europe!
www.startupbus.com
Vienna
Poland
Estonia
GermanyUK
France
Spain
Italy
Greece
Pre-apply now at startupbus.com
Follow @TheStartupBus
Who here knows what
devops is about?
developers building apps vs
operations running
apps in production
There is 

a bigger picture
there are a bit more than 2 silo’s
Defence 101
Units on the battleground
Units in training
Majors, Colonels and Generals in the staff
Defence 101 (bis)
An army needs a very strong HR and logistics
machine
Belgian government budget cuts usually cut in its
d...
calculating the cost of a training exercise took 

4 people
4 weeks
!
to go bug
!
5 application owners
!
for data hidden i...
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish
I had my data locally!
Stone Age
I’m ti...
Requirements
1. Centralize data
2. But protect sensitive data 

(HR, medical privacy, …)
3. Make the data available offline...
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish
I had my data locally!
Stone Age 2009
F...
XML-based prototypes
• Able to extract maximum 40 tables from the
logistics application in one night
• Slow
• Problems wit...
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish
I had my data locally!
Stone Age 2009
F...
New team
Hand-over to Dept AD&M (“the pro’s”)
New approach
Systems engineering: holistic view on the problem
Take into account the protection of sensitive data
Make it ...
Conceptually
• lots of data sources with data owners
• 1 central data “warehouse”
• lots of nodes downloading the data the...
HR app
Financial
app
Logistics
app
Planning
app
Excel
Ops
unit
data warehouse
another
app
Inside the data warehouse
Extraction Engine (EE)
File Server
Access Control
Extraction Engine (EE)
Based on open-source software:
Linux
MySQL
Talend (Eclipse based ETL workflow tool)
What does the EE do every
night?
• Detect the meta data (store it in XML format)
• Take a full dump of each data source in...
File server
• Stores the zip files available for the nodes
• Full copy only for the current day 

(but we have a history fo...
Access control
• Data providers determine themselves whether
their data is
• “public” within the organisation
• “restricte...
The nodes
Custom XAMPP package for local development
of reporting or JBoss for bigger nodes with
validated reports
Custom ...
Current status
some logistics guy deployed in Afghanistan
I can’t access the shared drive, I wish
I had my data locally!
Stone Age 2009
F...
@SpaceCatPics
"A LARGE SYSTEM IS ONE WHERE
YOU DO NOT KNOW THAT SOME OF
ITS COMPONENTS EVEN EXIST."
Some statistics
• 400 users (nodes)
• > 1 billion rows processed each night
• ~ 75 gigabytes of data processed each night
...
0
5
9
14
18
FTP LDAP Microsoft SQL Server MySQL Oracle PostgreSQL Sharepoint
32 source databases
big data schema
What used to take my team 4
weeks now takes us one click on a
button!
A major responsible for military training & exercises
Questions?

@stevenbeeckman #csvconf

Hackers, hipsters & hustlers should pre-apply at
www.startupbus.com
Image credits
http://www.photographersgallery.com/photo.asp?id=2411
Diagonal full of silos
http://www.pragmaticdevops.com/...
Upcoming SlideShare
Loading in...5
×

csv,conf 2014 - Open data within organizations

318

Published on

This talk describes how we are trying to open (sometimes sensitive) data within our organization.

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
318
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "csv,conf 2014 - Open data within organizations"

  1. 1. Opening data within organisations #csvconf 2014 - Berlin - @stevenbeeckman
  2. 2. hi
  3. 3. I’m @stevenbeeckman - a digital dj! mixcloud.com/gehorschade.kollektiv
  4. 4. Conductor for StartupBus Europe! www.startupbus.com
  5. 5. Vienna Poland Estonia GermanyUK France Spain Italy Greece Pre-apply now at startupbus.com Follow @TheStartupBus
  6. 6. Who here knows what devops is about?
  7. 7. developers building apps vs operations running apps in production
  8. 8. There is 
 a bigger picture
  9. 9. there are a bit more than 2 silo’s
  10. 10. Defence 101 Units on the battleground Units in training Majors, Colonels and Generals in the staff
  11. 11. Defence 101 (bis) An army needs a very strong HR and logistics machine Belgian government budget cuts usually cut in its defence budget first Need for integrated management
  12. 12. calculating the cost of a training exercise took 
 4 people 4 weeks ! to go bug ! 5 application owners ! for data hidden in
 relational databases Excel sheets Business Objects reports Access databases (not so) shared drives
  13. 13. some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth!
  14. 14. Requirements 1. Centralize data 2. But protect sensitive data 
 (HR, medical privacy, …) 3. Make the data available offline 4. Nodes should be able to regain current state after loss of communication for 5 days
  15. 15. some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age 2009 First XML based prototypes I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth!
  16. 16. XML-based prototypes • Able to extract maximum 40 tables from the logistics application in one night • Slow • Problems with identical rows
  17. 17. some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age 2009 First XML based prototypes New team & new approach I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth!
  18. 18. New team Hand-over to Dept AD&M (“the pro’s”)
  19. 19. New approach Systems engineering: holistic view on the problem Take into account the protection of sensitive data Make it more stable than the prototype Explicitly not real-time Check out NASA’s course: http:// www.saylor.org/sse101/
  20. 20. Conceptually • lots of data sources with data owners • 1 central data “warehouse” • lots of nodes downloading the data they have access rights to
  21. 21. HR app Financial app Logistics app Planning app Excel Ops unit data warehouse another app
  22. 22. Inside the data warehouse Extraction Engine (EE) File Server Access Control
  23. 23. Extraction Engine (EE) Based on open-source software: Linux MySQL Talend (Eclipse based ETL workflow tool)
  24. 24. What does the EE do every night? • Detect the meta data (store it in XML format) • Take a full dump of each data source in csv format • Calculate delta (deleted rows and inserted rows, in csv format) • Create two zip files: • One full copy • One delta for this day
  25. 25. File server • Stores the zip files available for the nodes • Full copy only for the current day 
 (but we have a history for a month) • Delta zip files for 14 days
  26. 26. Access control • Data providers determine themselves whether their data is • “public” within the organisation • “restricted” to a set of nodes
  27. 27. The nodes Custom XAMPP package for local development of reporting or JBoss for bigger nodes with validated reports Custom loader contacting Access Control and filling the MySQL database Custom “Local Reporting Framework” (XML + XSLT)
  28. 28. Current status
  29. 29. some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age 2009 First XML based prototypes New team & new approach I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth! 2014 Growth 40 90 1000
  30. 30. @SpaceCatPics
  31. 31. "A LARGE SYSTEM IS ONE WHERE YOU DO NOT KNOW THAT SOME OF ITS COMPONENTS EVEN EXIST."
  32. 32. Some statistics • 400 users (nodes) • > 1 billion rows processed each night • ~ 75 gigabytes of data processed each night • making the EE work requires > 2000 tables
  33. 33. 0 5 9 14 18 FTP LDAP Microsoft SQL Server MySQL Oracle PostgreSQL Sharepoint 32 source databases
  34. 34. big data schema
  35. 35. What used to take my team 4 weeks now takes us one click on a button! A major responsible for military training & exercises
  36. 36. Questions?
 @stevenbeeckman #csvconf
 Hackers, hipsters & hustlers should pre-apply at www.startupbus.com
  37. 37. Image credits http://www.photographersgallery.com/photo.asp?id=2411 Diagonal full of silos http://www.pragmaticdevops.com/2014/04/management/ hacking-management/devops-as-a-team-or-a-responsibility/ Two silos
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×