Your SlideShare is downloading. ×
0
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Building a big data warehouse
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building a big data warehouse

467

Published on

Let's face it: data warehousing in the traditional sense has been tedious, lacking agility, and slow. Often designed and built around a static set of up front questions, they are basically a big …

Let's face it: data warehousing in the traditional sense has been tedious, lacking agility, and slow. Often designed and built around a static set of up front questions, they are basically a big database tailored to the application of populating dashboards. The fact that the volume of data that we need to deal with recently exploded by several order of magnitude isn't improving the situation. It's no surprise that we see a new class of data warehousing setups emerge, using big data technologies en NoSQL stores. A nice side effect is that these solutions are usually not only the domain of the BI crowd, but can also be developer friendly and allow development of more data driven apps.

In this session I will present about experiences using Hadoop and other tools from the Hadoop ecosystem, such as Hive, Pig and bare MapReduce, to handle data that grows tens of GBs per day. We create a system where data is captured, stored and made available to different users and use cases, ranging from end users that write SQL queries to software developers that access the underlying data to create data driven products. I will cover topics like ETL, querying, development and deployment and reporting; using a fully open source stack, of course.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
467
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. GoDataDrivenPROUDLY PART OF THE XEBIA GROUP@mids106jorisbontje@godatadriven.comBuilding a Big DataWarehouseJoris BontjeBig Data Hacker
  • 2. About MeBig Data HackerData Driven Solution ArchitectHadoopTrainer
  • 3. About GoDataDriven
  • 4. Data WarehouseEvolution
  • 5. http://en.wikipedia.org/wiki/Data_warehouseIn computing, a data warehouse is adatabase used for reporting and dataanalysis.
  • 6. Database Architecture (1.0)Products)Customers)Orders)Inventory)Sales)DB)
  • 7. Analytical Database (2.0)Sales&Inventory&Customers&Products&Orders&
  • 8. Basic DWH ArchitectureTX#DB#Analy+cal#DB#BI#ETL
  • 9. Data MartsTX#DB#DW#Sales#Mktg#Prch#BI#
  • 10. Multiple Data-Sourcesother&Files&TX&DB&DW&Sales&Mktg&Prch&BI&
  • 11. Operational Data StoreDW#ODS#other#Files#TX#DB# Sales#Mktg#Prch#BI#
  • 12. Hadoop
  • 13. GoDataDrivenNo HadoopDW#ODS#other#Files#TX#DB# Sales#Mktg#Prch#BI#
  • 14. GoDataDrivenETL Engineother&Files&TX&DB& Sales&Mktg&Prch&DWBI&
  • 15. GoDataDrivenTiered Data Warehouseother&Files&TX&DB& Sales&Mktg&Prch&BI&
  • 16. GoDataDrivenAnalytical Query Engineother&Files&TX&DB&BI&
  • 17. Tools
  • 18. Tools
  • 19. Tools Applied
  • 20. Tools Applied
  • 21. Considerations
  • 22. ConsiderationsBig Data is dirtyAutomate everythingMonitoring and QA become the same thing
  • 23. My Past TrendsBig Data Forum 2012
  • 24. My PastTrendsCloud / On-demand
  • 25. My PastTrendsHadoop Hardware
  • 26. My PastTrendsBatch → Real-Time
  • 27. New TrendsXebiCon 2013
  • 28. TrendsImpalaOpen Source, Real-time Query enginefor Hadoop
  • 29. TrendsDefacto standard for Hadoop metadata
  • 30. Simple Database ArchitectureProducts)Customers)Orders)Inventory)Sales)DB)
  • 31. The future?Products)Customers)Orders)Inventory)Sales)
  • 32. GoDataDrivenWe’re hiring / Questions? / Thank you!@mids106jorisbontje@godatadriven.comJoris BontjeBig Data Hacker

×