Hadoop Integration into Data Warehousing Architectures

4,331 views

Published on

This presentation is an explanation of the research work done in the topic of 'hadoop integration into data warehouse architectures'. It explains where Hadoop fits into data warehouse architecture. Furthermore, it purposes a BI assessment model to determine the capability of current BI program and how to define roadmap for its maturity.

Hadoop Integration into Data Warehousing Architectures

  1. 1. Integrating Hadoop into Data Warehousing Architecture Where is the Wisdom? Lost in the Knowledge. Where is the Knowledge? Lost in the Information. T.S. Eliot © Humza Naseer, University of Melbourne 2014
  2. 2. Outline Findings, Conclusion & Future Work Current Work: Hadoop Integration into Data Warehouse Environment Related Work: Trends in Data Warehouse Architecture Link Between Hadoop and Data Warehouse Introduction © Humza Naseer, University of Melbourne 2014 2
  3. 3. Identify all possible enterprise data assets Select those assets that have actionable content and can be accessed Bring the data assets into a logically centralized “enterprise data warehouse” Expose those data assets most effectively for decision making (Kimball & Ross, 2013) Intro: The Data Warehouse Mission © Humza Naseer, University of Melbourne 2014 3
  4. 4. Hadoop is an Ecosystem of products  Open source  Vendor distributions  Additional tools for development and administration Hadoop Benefits  Enables big data analytics  Supports advanced forms of analytics  Scales cost effectively  Extends a data warehouse environment Hadoop Limitations • Low latency queries • Ease of access • Data integration and integrity • Fine grained security Intro: Overview of Hadoop Unstructured Data Query Results HDFS Data Nodes Map Reduce © Humza Naseer, University of Melbourne 2014 4
  5. 5. A data warehouse system fetches and unifies data from heterogeneous source systems into a centralized dimensional or normalized data repository (Rainardi, 2008) Data warehouse is not a tool or technology  It is a business process which unifies an enterprise through data (Eckerson, 2012) Hadoop a problem or an opportunity? Where Hadoop fits into data warehouse architecture? Link Between Hadoop and Data Warehouse © Humza Naseer, University of Melbourne 2014 5
  6. 6. Traditional RDBMSs cannot handle  The new data types  Extended analytic processing  Terabytes/hour loading with immediate query access We want to use SQL, but we don’t want the RDBMS storage constraints The disruptive solution: Hadoop (Kimball & Ross, 2013) Why is Integration Happening? DB1 DB2 DB3 Transformation and Load Central DW BI App-1 BI App-2 BI App-3 Decision Making © Humza Naseer, University of Melbourne 2014 6
  7. 7. Ponniah (2011) notes that selection of DW architecture is based on enterprise requirements. DW architecture has multiple architectural layers and components  Logical architecture  Physical architecture (Moss and Atre, 2013) DW architecture overlaps with data integration, business intelligence and enterprise data (Russom, 2014) Inmon vs Kimball dichotomy (Ariyachandra and Watson, 2010) Trends in Data Warehouse Architectures © Humza Naseer, University of Melbourne 2014 7
  8. 8. Eckerson (2012) notes that reporting and analytics have different workload requirements Reporting is based on the entities and facts which are well known Advanced analytics empowers the discovery of new facts which are not well known Multi-platform unified data architecture  Includes enterprise data warehouse (EDW) and several other new data platforms which augment EDW (Russom, 2013) Hadoop Integration into data warehousing environment © Humza Naseer, University of Melbourne 2014 8
  9. 9. Data Staging Data archiving Advanced analytics Multi-structured data Uses of Hadoop that Extend DW Architectures DB1 DB2 DB3 Transformation and Load EDW BI App-1 BI App-2 BI App-3 Decision Making © Humza Naseer, University of Melbourne 2014 9
  10. 10. Analytics and reporting have different requirements for DW architectures Characterize the DW architecture by counting the number and types of workloads it supports Logical DW architecture must integrate multiple physical platforms Design of logical DW architecture must be compartmentalized Proposed logical architecture for new DW ecosystem (An Extension of Eckerson (2012) BI architecture) Findings © Humza Naseer, University of Melbourne 2014 10
  11. 11. Enterprise Data WarehouseOperational System Operational System Operational Data Store Subject Area Data Marts BI Server Online Transaction Processing Systems (Relational Data) Event driven alerting environment Reporting/analysis Environment Logical Architecture of New DW Ecosystem DW-Centric Sandbox Web Data Machine Data Log files Legacy/External Data Replicated Sandbox In-memory BI Sandbox Hadoop Ecosystem Cluster (Non-relational Data) Exploration/discovery environment Non-relational Extract, transform and Load (Batch, real time or near real time) Power User Casual User QueryETLStreaming Top down architecture Bottom up architecture © Humza Naseer, University of Melbourne 2014 11
  12. 12. BI Assessment Model Data Warehouse Ecosystem Data Marts Enterprise Data Warehouse Work Load Specific Data Platforms Workload Capacity Degree of Integration High High Low Low Degree of Standardization High Low © Humza Naseer, University of Melbourne 2014 12
  13. 13. Hadoop enables new types of applications within DW environment Big data analytics, advanced analytics and discovery analytics Information exploration and augmenting a data warehouse Should be implemented in multi-platform DW environment Future work:  Conformed dimensions  BI maturity roadmap Conclusion © Humza Naseer, University of Melbourne 2014 13
  14. 14. Questions © Humza Naseer, University of Melbourne 2014 14

×