Be the first to like this
Near real-time, big data analytics is a reality via a new data pattern that avoids the latency and overhead of legacy ETL–the 3 T’s of Hadoop: Transfer, Transform, and Translate. Transfer: Once a Hadoop infrastructure is in place, a mandate is needed to immediately and continuously transfer all enterprise data, from external and internal sources and through different existing systems, into Hadoop. Previously, enterprise data was isolated, disconnected and monolithically segmented. Through this T, various source data are consolidated and centralized in Hadoop almost as they are generated in near real-time. Transform: Most of the enterprise data, when flowing into Hadoop, is transactional in nature. Analytics requires data be transformed from record-based OLTP form to column-based OLAP. This T is not the same T in ETL as we need to retain the granularity in the data feeds. The key is to transform in-place within Hadoop, without further data movement from Hadoop to other legacy systems. Translate: We pre-compute or provide on-the-fly views of analytical data, exposed for consumption. We facilitate analysis and reporting, for both scheduled and ad hoc needs, to be interactive with the data for analysts and end users, integrated in and on top of Hadoop.