Content and talk by Friso van Vollenhoven (GoDataDriven)
Let's face it: data warehousing in the traditional sense has been tedious, lacking agility, and slow. Often designed and built around a static set of up front questions, they are basically a big database tailored to the application of populating dashboards. The fact that the volume of data that we need to deal with recently exploded by several order of magnitude isn't improving the situation. It's no surprise that we see a new class of data warehousing setups emerge, using big data technologies en NoSQL stores. A nice side effect is that these solutions are usually not only the domain of the BI crowd, but can also be developer friendly and allow development of more data driven apps.
In this talk I will present about experiences using Hadoop and other tools from the Hadoop ecosystem, such as Hive, Pig and bare MapReduce, to handle data that grows tens of GBs per day. We create a system where data is captured, stored and made available to different users and use cases, ranging from end users that write SQL queries to software developers that access the underlying data to create data driven products. I will cover topics like ETL, querying, development and deployment and reporting; using a fully open source stack, of course.