csv,conf 2014 - Open data within organizations
Upcoming SlideShare
Loading in...5
×
 

csv,conf 2014 - Open data within organizations

on

  • 251 views

This talk describes how we are trying to open (sometimes sensitive) data within our organization.

This talk describes how we are trying to open (sometimes sensitive) data within our organization.

Statistics

Views

Total Views
251
Views on SlideShare
234
Embed Views
17

Actions

Likes
1
Downloads
2
Comments
0

2 Embeds 17

https://twitter.com 9
http://lanyrd.com 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

csv,conf 2014 - Open data within organizations csv,conf 2014 - Open data within organizations Presentation Transcript

  • Opening data within organisations #csvconf 2014 - Berlin - @stevenbeeckman
  • hi
  • I’m @stevenbeeckman - a digital dj! mixcloud.com/gehorschade.kollektiv
  • Conductor for StartupBus Europe! www.startupbus.com
  • Vienna Poland Estonia GermanyUK France Spain Italy Greece Pre-apply now at startupbus.com Follow @TheStartupBus
  • Who here knows what devops is about?
  • developers building apps vs operations running apps in production
  • There is 
 a bigger picture
  • there are a bit more than 2 silo’s
  • Defence 101 Units on the battleground Units in training Majors, Colonels and Generals in the staff
  • Defence 101 (bis) An army needs a very strong HR and logistics machine Belgian government budget cuts usually cut in its defence budget first Need for integrated management
  • calculating the cost of a training exercise took 
 4 people 4 weeks ! to go bug ! 5 application owners ! for data hidden in
 relational databases Excel sheets Business Objects reports Access databases (not so) shared drives
  • some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth!
  • Requirements 1. Centralize data 2. But protect sensitive data 
 (HR, medical privacy, …) 3. Make the data available offline 4. Nodes should be able to regain current state after loss of communication for 5 days
  • some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age 2009 First XML based prototypes I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth!
  • XML-based prototypes • Able to extract maximum 40 tables from the logistics application in one night • Slow • Problems with identical rows
  • some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age 2009 First XML based prototypes New team & new approach I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth!
  • New team Hand-over to Dept AD&M (“the pro’s”)
  • New approach Systems engineering: holistic view on the problem Take into account the protection of sensitive data Make it more stable than the prototype Explicitly not real-time Check out NASA’s course: http:// www.saylor.org/sse101/
  • Conceptually • lots of data sources with data owners • 1 central data “warehouse” • lots of nodes downloading the data they have access rights to
  • HR app Financial app Logistics app Planning app Excel Ops unit data warehouse another app
  • Inside the data warehouse Extraction Engine (EE) File Server Access Control
  • Extraction Engine (EE) Based on open-source software: Linux MySQL Talend (Eclipse based ETL workflow tool)
  • What does the EE do every night? • Detect the meta data (store it in XML format) • Take a full dump of each data source in csv format • Calculate delta (deleted rows and inserted rows, in csv format) • Create two zip files: • One full copy • One delta for this day
  • File server • Stores the zip files available for the nodes • Full copy only for the current day 
 (but we have a history for a month) • Delta zip files for 14 days
  • Access control • Data providers determine themselves whether their data is • “public” within the organisation • “restricted” to a set of nodes
  • The nodes Custom XAMPP package for local development of reporting or JBoss for bigger nodes with validated reports Custom loader contacting Access Control and filling the MySQL database Custom “Local Reporting Framework” (XML + XSLT)
  • Current status
  • some logistics guy deployed in Afghanistan I can’t access the shared drive, I wish I had my data locally! Stone Age 2009 First XML based prototypes New team & new approach I’m tired of these Excel files and Access databases saying something contradictory. 
 
 Gimme the damn truth! 2014 Growth 40 90 1000
  • @SpaceCatPics
  • "A LARGE SYSTEM IS ONE WHERE YOU DO NOT KNOW THAT SOME OF ITS COMPONENTS EVEN EXIST."
  • Some statistics • 400 users (nodes) • > 1 billion rows processed each night • ~ 75 gigabytes of data processed each night • making the EE work requires > 2000 tables
  • 0 5 9 14 18 FTP LDAP Microsoft SQL Server MySQL Oracle PostgreSQL Sharepoint 32 source databases
  • big data schema
  • What used to take my team 4 weeks now takes us one click on a button! A major responsible for military training & exercises
  • Questions?
 @stevenbeeckman #csvconf
 Hackers, hipsters & hustlers should pre-apply at www.startupbus.com
  • Image credits http://www.photographersgallery.com/photo.asp?id=2411 Diagonal full of silos http://www.pragmaticdevops.com/2014/04/management/ hacking-management/devops-as-a-team-or-a-responsibility/ Two silos