Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a big data warehouse


Published on

Let's face it: data warehousing in the traditional sense has been tedious, lacking agility, and slow. Often designed and built around a static set of up front questions, they are basically a big database tailored to the application of populating dashboards. The fact that the volume of data that we need to deal with recently exploded by several order of magnitude isn't improving the situation. It's no surprise that we see a new class of data warehousing setups emerge, using big data technologies en NoSQL stores. A nice side effect is that these solutions are usually not only the domain of the BI crowd, but can also be developer friendly and allow development of more data driven apps.

In this session I will present about experiences using Hadoop and other tools from the Hadoop ecosystem, such as Hive, Pig and bare MapReduce, to handle data that grows tens of GBs per day. We create a system where data is captured, stored and made available to different users and use cases, ranging from end users that write SQL queries to software developers that access the underlying data to create data driven products. I will cover topics like ETL, querying, development and deployment and reporting; using a fully open source stack, of course.

Published in: Technology, Business
  • Be the first to comment

Building a big data warehouse

  1. 1. GoDataDrivenPROUDLY PART OF THE XEBIA GROUP@mids106jorisbontje@godatadriven.comBuilding a Big DataWarehouseJoris BontjeBig Data Hacker
  2. 2. About MeBig Data HackerData Driven Solution ArchitectHadoopTrainer
  3. 3. About GoDataDriven
  4. 4. Data WarehouseEvolution
  5. 5. computing, a data warehouse is adatabase used for reporting and dataanalysis.
  6. 6. Database Architecture (1.0)Products)Customers)Orders)Inventory)Sales)DB)
  7. 7. Analytical Database (2.0)Sales&Inventory&Customers&Products&Orders&
  8. 8. Basic DWH ArchitectureTX#DB#Analy+cal#DB#BI#ETL
  9. 9. Data MartsTX#DB#DW#Sales#Mktg#Prch#BI#
  10. 10. Multiple Data-Sourcesother&Files&TX&DB&DW&Sales&Mktg&Prch&BI&
  11. 11. Operational Data StoreDW#ODS#other#Files#TX#DB# Sales#Mktg#Prch#BI#
  12. 12. Hadoop
  13. 13. GoDataDrivenNo HadoopDW#ODS#other#Files#TX#DB# Sales#Mktg#Prch#BI#
  14. 14. GoDataDrivenETL Engineother&Files&TX&DB& Sales&Mktg&Prch&DWBI&
  15. 15. GoDataDrivenTiered Data Warehouseother&Files&TX&DB& Sales&Mktg&Prch&BI&
  16. 16. GoDataDrivenAnalytical Query Engineother&Files&TX&DB&BI&
  17. 17. Tools
  18. 18. Tools
  19. 19. Tools Applied
  20. 20. Tools Applied
  21. 21. Considerations
  22. 22. ConsiderationsBig Data is dirtyAutomate everythingMonitoring and QA become the same thing
  23. 23. My Past TrendsBig Data Forum 2012
  24. 24. My PastTrendsCloud / On-demand
  25. 25. My PastTrendsHadoop Hardware
  26. 26. My PastTrendsBatch → Real-Time
  27. 27. New TrendsXebiCon 2013
  28. 28. TrendsImpalaOpen Source, Real-time Query enginefor Hadoop
  29. 29. TrendsDefacto standard for Hadoop metadata
  30. 30. Simple Database ArchitectureProducts)Customers)Orders)Inventory)Sales)DB)
  31. 31. The future?Products)Customers)Orders)Inventory)Sales)
  32. 32. GoDataDrivenWe’re hiring / Questions? / Thank you!@mids106jorisbontje@godatadriven.comJoris BontjeBig Data Hacker