The factory and the workshopOpen source metadata driven data warehousing<br />Johannes van den Bosch<br />@johannesvdb<br />
Agenda<br />Background<br />Organization<br />Project<br />Basicarchitecture<br />The factory<br />…and the workshop<br />...
Waterschap De Dommel<br />Dutch Waterboard in south of Netherlands<br />Managing water quality and quantity<br />for 900.0...
Project<br />Current BI architecturereachedlimits<br />New greenfieldarchitecture<br />Open source advocate<br />Passionat...
Open source software<br />ETL<br />Pentaho Data Integration (Kettle)<br />Data warehouse management<br />Quipu<br />Docume...
EDW architecture<br />Reporting<br />Analysis<br />Dashboards<br />Data mart 1<br />Data mart n<br />Business Data Vault<b...
Developmentapproach<br />supply<br />driven<br />effort<br />demand<br />driven<br />time<br />
Plant, productionlines<br />
The factory<br />
The factory<br />Source<br />Staging<br />Source<br />Data vault<br />ETL<br />ETL<br />
HowQuipuworks<br />Source model<br />Target model<br />Template<br />Load code<br />(ETL)<br />
1. Loadsource model<br />
2. Generatestaging DDL<br />
Notcross-platform!<br />…I want my ETL tool..!<br />3. Generatestaging ETL<br />Default ETL:<br />INSERT INTO staging_tabl...
4. Generatesource data vault model<br />
4. Generatesource data vault DDL<br />
5. Generatesource data vault ETL<br />
Starting up the factory<br />
The workshop<br />
PoC<br />Decided to try and build the bDV and Data Marts 100% virtual<br />bDV = views on top of sDV<br />Data marts = vie...
Functionalcomponents<br />History<br />Integration<br />Transformation<br />applyingsemantics, filtering, etc.<br />
bDV design decisions: full bDV<br />Source data vault<br />Business data vault<br />H<br />T<br />H<br />I<br />H<br />
Integrationstrategies<br />1) same-as link<br />H<br />L<br />H<br />S<br />S<br />2) integratedhub<br />S<br />H<br />S<b...
Integration: hubs<br />Source<br />Source data vault<br />Business data vault<br />person<br />employee_h<br />employee_h_...
Hubintegration – virtual<br />System x<br />BK1<br />System y<br />BK2<br />New BK<br />Integration business rule<br />
Transformation: supertype example<br />sDV<br />Business rule<br />bDV<br />5___<br />P______<br />4____<br />
Transformation: hierarchyexample<br />sDV<br />bDV<br />
Data marts: virtual (simpleexample)<br />Dimensionfromhub + sat<br />Factfrom link + sat<br />SCD type 1<br />
Virtual: lineage<br />
bDV design decisions: partialbDV<br />Source<br />data vault<br />Business data vault<br />T<br />H<br />I<br />H<br />T<b...
Full bDVvspartialbDV<br />Full<br />Lots of elements to define<br />Easy data marts<br />Partial<br />Lesswork<br />More T...
VirtualvsPhysical<br />Virtual (views)<br />No physicalmaintenance<br />Easy to adapt<br />Performance limitations<br />Pl...
Self service BI and write back<br />Palofor Excel<br />Open source MOLAP<br />Everycellpoints to location in the cube<br /...
Lessonslearnt<br />Itispossible to quicklybuildan EDW with open source software<br />Somereallycooldevelopments (ie. data ...
Upcoming SlideShare
Loading in...5
×

Data vault seminar May 5-6 Dommel - The factory and the workshop

2,604

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,604
On Slideshare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Data vault seminar May 5-6 Dommel - The factory and the workshop

  1. 1. The factory and the workshopOpen source metadata driven data warehousing<br />Johannes van den Bosch<br />@johannesvdb<br />
  2. 2. Agenda<br />Background<br />Organization<br />Project<br />Basicarchitecture<br />The factory<br />…and the workshop<br />Topics fromyesterday<br />Real time<br />Business keyintegration<br />Staging out<br />Hierarchies and supertypes<br />Self service BI and writeback<br />Lessonslearnt<br />
  3. 3. Waterschap De Dommel<br />Dutch Waterboard in south of Netherlands<br />Managing water quality and quantity<br />for 900.000 citizens<br />and 150.000 hectares<br />375 employees<br />of which 2 BI<br />Projectsmanagedon time, money and goals<br />Demandforintegrated management information<br />
  4. 4. Project<br />Current BI architecturereachedlimits<br />New greenfieldarchitecture<br />Open source advocate<br />Passionatebeliever in costeffectivesolutionsforgovernment<br />It’s our money!<br />Convinced management<br />No software cost<br />Internalhours (2x 0.3 FTE)<br />1 year<br />
  5. 5. Open source software<br />ETL<br />Pentaho Data Integration (Kettle)<br />Data warehouse management<br />Quipu<br />Documentation<br />MediaWiki<br />Modeling<br />Power*Architect<br />
  6. 6. EDW architecture<br />Reporting<br />Analysis<br />Dashboards<br />Data mart 1<br />Data mart n<br />Business Data Vault<br />Source Data Vault 1<br />Source Data Vault 2<br />Source Data Vaultn<br />Supplydriven<br />Demanddriven<br />Generated and automated<br />Staging 1<br />Staging 2<br />Stagingn<br />Source 1<br />Source 2<br />Sourcen<br />
  7. 7. Developmentapproach<br />supply<br />driven<br />effort<br />demand<br />driven<br />time<br />
  8. 8. Plant, productionlines<br />
  9. 9. The factory<br />
  10. 10. The factory<br />Source<br />Staging<br />Source<br />Data vault<br />ETL<br />ETL<br />
  11. 11. HowQuipuworks<br />Source model<br />Target model<br />Template<br />Load code<br />(ETL)<br />
  12. 12. 1. Loadsource model<br />
  13. 13. 2. Generatestaging DDL<br />
  14. 14. Notcross-platform!<br />…I want my ETL tool..!<br />3. Generatestaging ETL<br />Default ETL:<br />INSERT INTO staging_table<br />SELECT fields FROM source_table<br />
  15. 15. 4. Generatesource data vault model<br />
  16. 16. 4. Generatesource data vault DDL<br />
  17. 17. 5. Generatesource data vault ETL<br />
  18. 18. Starting up the factory<br />
  19. 19. The workshop<br />
  20. 20. PoC<br />Decided to try and build the bDV and Data Marts 100% virtual<br />bDV = views on top of sDV<br />Data marts = views on top of bDV<br />Conclusion: it is possible<br />
  21. 21. Functionalcomponents<br />History<br />Integration<br />Transformation<br />applyingsemantics, filtering, etc.<br />
  22. 22. bDV design decisions: full bDV<br />Source data vault<br />Business data vault<br />H<br />T<br />H<br />I<br />H<br />
  23. 23. Integrationstrategies<br />1) same-as link<br />H<br />L<br />H<br />S<br />S<br />2) integratedhub<br />S<br />H<br />S<br />3) integratedhub + integratedsatellite<br />H<br />S<br />
  24. 24. Integration: hubs<br />Source<br />Source data vault<br />Business data vault<br />person<br />employee_h<br />employee_h_s<br />person_h<br />person_h_s<br />System x<br />Users<br />Users_h<br />Users_h_s<br />System y<br />
  25. 25. Hubintegration – virtual<br />System x<br />BK1<br />System y<br />BK2<br />New BK<br />Integration business rule<br />
  26. 26. Transformation: supertype example<br />sDV<br />Business rule<br />bDV<br />5___<br />P______<br />4____<br />
  27. 27. Transformation: hierarchyexample<br />sDV<br />bDV<br />
  28. 28. Data marts: virtual (simpleexample)<br />Dimensionfromhub + sat<br />Factfrom link + sat<br />SCD type 1<br />
  29. 29. Virtual: lineage<br />
  30. 30. bDV design decisions: partialbDV<br />Source<br />data vault<br />Business data vault<br />T<br />H<br />I<br />H<br />T<br />H<br />T<br />
  31. 31. Full bDVvspartialbDV<br />Full<br />Lots of elements to define<br />Easy data marts<br />Partial<br />Lesswork<br />More T between data vault and data marts<br />Multiple versions of the truth<br />
  32. 32. VirtualvsPhysical<br />Virtual (views)<br />No physicalmaintenance<br />Easy to adapt<br />Performance limitations<br />Platform defines performance<br />Lineage (dependingon platform)<br />Real time<br />Auditability?<br />Physical<br />Scalability / performance<br />Manualtweaking (indexes, etc.)<br />Surrogatekeys easy<br />More intuitive to develop (ETL in stead of SQL)<br />More complex transformations (ie. aggregations)<br />
  33. 33. Self service BI and write back<br />Palofor Excel<br />Open source MOLAP<br />Everycellpoints to location in the cube<br />Writeback to cubepossible<br />EDW<br />cube<br />excel<br />
  34. 34. Lessonslearnt<br />Itispossible to quicklybuildan EDW with open source software<br />Somereallycooldevelopments (ie. data mart generation)<br />Automationonlygoessofar<br />Somechallengesstillneed to beaddressed<br />…it is business intelligenceafter all.<br />Automate, ifitsavesyou money<br />Itcan save you time to focus on the important stuff<br />The end product counts: does itdeliveraddedvalue?<br />What’s the best EDW architecture? <br />Itdepends!™<br />

×