Adobe Systems uses “SaasBase Analytics” to incrementally process large heterogeneous data sets into pre-aggregated, indexed views, stored in HBase to be queried in real- time. Our goal was to process new data in real- time (currently minutes) and have it ready for a large number of concurrent queries that execute in milliseconds. This set our problem apart from what is traditionally solved with Hive or PIG. In this talk I’ll describe the design and the strategies (and hacks) we used to achieve low latency and scalability, from theoretical model to the entire process of ETL to warehousing and queries.
Clipping is a handy way to collect important slides you want to go back to later.