Be the first to like this
With the advent of big data, the enterprise analytics landscape has dramatically changed. The HDFS has become an important data repository for all business analytics. Enterprises are using various big data technologies to process data and drive actionable insights. HDFS serves as the storage where other distributed processing frameworks, such as Hadoop and Spark, access and operate on large volumes of data. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. EDWs are usually shared-nothing parallel databases that support complex SQL processing, updates, and transactions. As a result, they manage up-to-date data and support various business analytics tools, such as reporting and dashboards. A new generation of applications have emerged, requiring access and correlation of data stored in HDFS and EDWs. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. In this talk, we identify the best hybrid warehouse architecture by studying various algorithms to join database and HDFS tables.