Data warehouse 101-fundamentals-


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Data Integration Layer – ODS: ODS is a non-queryable centralized staging areas for storing extracted, cleansed, and transformed data, and for gathering centralized metadata for implementing an Enterprise Data Mart Architecture (EDMA), eliminating the need for another non-queryable staging area called data warehouse. Needed is a dimensionally modeled data warehouse for enterprise DSS, prepared to provide the best in query response performance and to support the most advanced OLAP functionalities.
  • Data warehouse 101-fundamentals-

    1. 1. Enterprise Data Warehouse Fundamentals 101 KIDS Phase II Project Mojo Nwokoma Director , Enterprise Data Systems Architecture Office of Assessment & Information Services Oregon Department of Education 503-378-3600 x2242 [email_address]
    2. 2. What is Enterprise DW/BI Solutions <ul><li>Data Warehouse (DW) is a collection of integrated, subject oriented, time-variant, and non-volatile data from various sources into a single and consistent warehouse that supports reporting, analysis, and decision making within the enterprise. </li></ul><ul><ul><li>Integrates operational data through consistent naming conventions, measurements, physical attributes, and semantics. </li></ul></ul><ul><li>Business Intelligence (BI) solutions use a blending of technologies, including relational and multi-dimansional databases, client/server architecture, and graphical user interface, to integrate disparate data sources into a single coherent framework for real-time reporting, drill-through analysis and decision support. </li></ul>
    3. 3. KIDS Phase I Project Report: The Business Case for Change <ul><li>NCLB & Federal Accountability: </li></ul><ul><ul><li>The reporting and performance requirements of NCLB demands a fundamental change in the statewide data collection and reporting. </li></ul></ul><ul><ul><li>The need to report individual student achievement and aggregation of data by subgroups. </li></ul></ul><ul><li>Statewide Accountability & Efficiency: </li></ul><ul><ul><li>Major gaps and inefficiencies exist in information collection, reporting and analysis at both district and state levels that need to be addressed. </li></ul></ul><ul><ul><li>Funding formula compliance and equity, and ability to evaluate relative effectiveness/ineffectiveness of education programs. </li></ul></ul><ul><li>Student & Community Service: </li></ul><ul><ul><li>Growing stakeholder demand for a significant improvement in student records availability, accessibility, portability, and accuracy. </li></ul></ul><ul><li>Cost Efficiency Gains: </li></ul><ul><ul><li>Evidence of economies of scale resulting in cost reductions in the management of administrative systems at the larger school districts and ESD structure. </li></ul></ul><ul><ul><li>Source: KIDS Phase 1 Final report: ODE & IBM Confidential (10/20/05) </li></ul></ul>
    4. 4. KIDS Phase II Project Planning: 5 key questions for a successful project planning & implementation What? When? Who? Why? How? SUCCESS <ul><li>Deliverables/Scope </li></ul><ul><li>Stakeholders </li></ul><ul><li>Resources </li></ul>- Business Case <ul><li>Time Frame </li></ul>
    5. 5. The Essential Building Blocks for a Successful Enterprise Information Management Project
    6. 6. Enterprise Data Warehouse Architecture District Data Warehouse <ul><li>STAGING </li></ul><ul><li>- Integration </li></ul><ul><li>- Metadata </li></ul><ul><li>- Cleansed </li></ul><ul><li>Profiled </li></ul><ul><li>Biz Rules </li></ul>Data Mart SIS Data Mart FINANCE Data Mart Transportation Data Mart Instruction HR SIS Curriculum & Instruction Finance Nutrition Extraction Phase Transformation Phase Load Phase Data Management H/W Server Platform Applications
    7. 7. KIDS Phase II Project District/ESD Server Deployment & Data Warehouse Integration Architecture Transaction System Transaction System Transaction System Transaction System Transaction System ODE (State) DW Physical & Virtual ODS = Operational Data Store DW = Data Warehouse ODS ODS ODS ODS ODS Hillsboro District DW Beaverton District DW Portland District DW Eugene District DW ESDs DW LEGENDS: KIDS Work = ODE Districts Record Exchange
    8. 8. Project Planning Methodology – “The How?” For the Project Team Step 1: Define the Work Breakdown Structure  The first is to create a comprehensive Work Breakdown Structure (WBS). The WBS lists all the phases, activities and tasks required to undertake the project. Identify and describe each phase, activity and task required to complete the project successfully. Depict the order in which the tasks must be undertaken and identify any key internal and external project dependencies. Also list the critical project milestones, such as the completion of key project deliverables Step 2: Identify the Required Resources   Having listed all of the tasks required to undertake the project, you now need to identify the generic resources required to complete each task. Examples of types of resource include: full-time and part-time staff, contractors, equipment and materials. For each resource type, identify the quantity required, the delivery dates and the project tasks in the WBS that the resource will be used to help complete. Step 3: Construct a Project Schedule To construct your schedule, you need to: List the phases, activities and tasks Sequence the phases, activities and tasks Add key internal and external dependencies Allocate relevant completion timeframes Add additional contingency to mitigate risk Assign resources required to complete tasks List critical delivery milestones Specify any assumptions and constraints
    9. 9. Current Data Environment <ul><li>Lack of granular, integrated, accurate, standardized, and timely data regarding student performance and achievement both for individual students and specific student subgroups. </li></ul><ul><li>Lack of data and information integration between instructional management sources and assessment sources aligned to the state’s specific educational standards. Other factors affected include attendance record, discipline, teacher qualification, classroom size, and instructional hours. </li></ul><ul><li>Lack of close collaboration between the districts and ODE in tracking students as they move through the educational system, both vertically and horizontally, in order to improve performance by identifying actionable indicators. </li></ul><ul><li>Lack of specific multi-year student performance information to support longitudinal analysis, accessible at the State, District, and school levels with appropriate controls to assure confidentiality. </li></ul><ul><li>Need for data and tool standardization between all reporting districts to ensure accurate, consistent, and useful analytical input for decision making purposes. </li></ul><ul><li>No single version of the truth exists for business rules and data definitions among various data sources. </li></ul><ul><li>Lack of easily validated financial information that accurately reports budgeted vs actual expenditures by program that allows correlation of these expenditures to student performances. </li></ul><ul><li>Lack of appropriately controlled online access to information for all stakeholders regarding student progress and school quality. </li></ul>
    10. 10. Problems with Current Decision Support <ul><ul><ul><li>Data redundancy and process redundancy </li></ul></ul></ul><ul><ul><ul><li>Data is not integrated and cannot be shared </li></ul></ul></ul><ul><ul><ul><li>Data is not understood or misunderstood </li></ul></ul></ul><ul><ul><ul><li>Inconsistent data definitions & business rules </li></ul></ul></ul><ul><ul><ul><li>Data retrieval is difficult and time consuming </li></ul></ul></ul><ul><ul><ul><li>Operational files may not contain history </li></ul></ul></ul><ul><ul><ul><li>Reports are inconsistent in content & format </li></ul></ul></ul><ul><ul><ul><li>Data is too dirty for business analysis </li></ul></ul></ul><ul><ul><ul><li>Multiple versions of the truth </li></ul></ul></ul><ul><ul><ul><li>Reports and associated BI tools are not standardized. </li></ul></ul></ul>
    11. 11. Recommended Model for Enterprise Data Warehouse System KIDS Phase 11 Project E-Portal Data Warehouse Operational Data Store ODS Transactional Database Educational Stakeholder Communication Benchmarking/Decision Support District & State Reporting Day to Day Operations PK-12 Data Model for Information Management (ODE & IBM Confidential) 10/20/05
    12. 12. Standards <ul><ul><ul><li>Governance (setting priorities) </li></ul></ul></ul><ul><ul><ul><li>Data naming, aliases, abbreviations list </li></ul></ul></ul><ul><ul><ul><li>Meta data capture and maintenance </li></ul></ul></ul><ul><ul><ul><li>Data quality and data management </li></ul></ul></ul><ul><ul><ul><li>Testing standards </li></ul></ul></ul><ul><ul><ul><li>Security standards </li></ul></ul></ul><ul><ul><ul><li>Measuring results (benefits, costs, usage) </li></ul></ul></ul><ul><ul><ul><li>Service level agreements </li></ul></ul></ul>Rules and protocols to be followed by all users and developers for all applications
    13. 13. DW roles & responsibilities <ul><li>Business User (Client) </li></ul><ul><li>Business User Support </li></ul><ul><li>Data Administrator </li></ul><ul><li>Data Analysts </li></ul><ul><li>Meta Data Administrator </li></ul><ul><li>Database Administrator </li></ul><ul><li>Developers </li></ul><ul><ul><li>ETL </li></ul></ul><ul><ul><li>BI – Reports, Queries </li></ul></ul>
    14. 14. DW roles & responsibilities (continued) <ul><li>Security Officer </li></ul><ul><li>Auditor </li></ul><ul><li>Data Warehouse Project Manager </li></ul><ul><li>Technical Services </li></ul><ul><li>DW Architect </li></ul><ul><li>Technical Advisory Board </li></ul><ul><li>Steering Committee </li></ul>
    15. 15. Information quality <ul><li>Data is accurate </li></ul><ul><li>Data is consistent </li></ul><ul><li>Data is timely </li></ul><ul><li>Data is integrated </li></ul><ul><li>Data is complete </li></ul><ul><li>Data values follow the business rules </li></ul><ul><li>Data corresponds to valid values </li></ul><ul><li>Data is well understood </li></ul>
    16. 16. Data Warehouse/ODS & BI Layers <ul><li>Data Integration Layer & Operational Data Store (ODS): </li></ul><ul><li>ODS is a non queryable centralized staging areas for storing extracted, cleansing, and transformed data, and for gathering centralized metadata for implementing an Enterprise Data Mart Architecture (EDMA), eliminating the need for another non queryable staging area called data warehouse. </li></ul><ul><li>Needed is a dimensionally modeled Data Warehouse for enterprise DSS, prepared to provide the best in </li></ul><ul><ul><li>query response performance and to support the most advanced OLAP functionalities. </li></ul></ul>
    17. 17. Meta data components <ul><li>Data name (entity, attribute, table, column, field, etc.) </li></ul><ul><li>Business description of data (Project Start Date) </li></ul><ul><li>Source of data (file, field) </li></ul><ul><li>Business Owner of data </li></ul><ul><li>Business rules </li></ul><ul><li>Transformation rules </li></ul><ul><li>Domains (allowable values) </li></ul><ul><li>Data relationships </li></ul>
    18. 18. Meta data components (continued) <ul><li>Data quality (measure of reliability) </li></ul><ul><li>Timeliness (ex: current as of certain date) </li></ul><ul><li>Historical information </li></ul><ul><li>Aggregation rules </li></ul><ul><li>Security (who has access) </li></ul>
    19. 19. Meta data management <ul><li>Meta data administration </li></ul><ul><li>Business meta data </li></ul><ul><li>Technical meta data </li></ul><ul><li>ETL reconciliation </li></ul><ul><li>Data quality metrics </li></ul><ul><li>Standardization </li></ul><ul><li>Data ownership </li></ul><ul><li>Enterprise integration </li></ul>Meta Data = [Descriptive] Data About Data [of the Business] Information = Data within context Context = Meta data Information = Data + Meta data
    20. 20. KIDS Phase II Project 05-07 Workflow <ul><li>Review and validate business case for Phase II project </li></ul><ul><ul><li>Identify key stakeholders. </li></ul></ul><ul><ul><li>Validate requirements, Deliverables, and Expectations </li></ul></ul><ul><ul><li>Identify two or three key districts with viable data warehouse infrastructure as test sites. </li></ul></ul><ul><li>Review enterprise architecture, and identify Infrastructure components </li></ul><ul><ul><li>Transaction Level Applications </li></ul></ul><ul><ul><li>Operating & Database systems </li></ul></ul><ul><ul><li>Hardware Platforms </li></ul></ul><ul><li>3. Design Data Warehouse & Operational Data Store (ODS) Data Model </li></ul><ul><ul><li>Design Dimensional Modeling Schemas </li></ul></ul><ul><ul><li>Configure Extraction, Transformation, Load (ETL) </li></ul></ul><ul><ul><li>Metadata Capture (Repository or Data labels) </li></ul></ul><ul><ul><li>Data Quality Profiling </li></ul></ul><ul><ul><li>Vendor selection will be based on a competitive “Bake-off” results from three top vendors. </li></ul></ul><ul><li>4. Data Reporting & Information Delivery Mechanism </li></ul><ul><ul><li>Leverage existing reporting On-line Analytical Processing (OLAP) infrastructure </li></ul></ul><ul><ul><li>Design and implement subject-area data marts for effective horizontal reporting integration </li></ul></ul><ul><ul><li>Develop OLAP cubes for report aggregation and slice/dice querying investigations </li></ul></ul><ul><ul><li>Deploy enterprise portal with built-in security and user access authentication. </li></ul></ul><ul><li>5. End-user Training </li></ul><ul><ul><li>Design and schedule end-user training at all levels of data and reporting needs. </li></ul></ul><ul><ul><li>Identify “Train-the-Trainer” candidates from each school district for more detailed training. </li></ul></ul>
    21. 21. KIDS Phase II Project Key System Deliverables <ul><li>Transactional Systems/District Data Warehouse </li></ul><ul><ul><li>Integrating systems that support instructional management that empowers teachers to combine </li></ul></ul><ul><ul><li>student performance and instructional data to make informed classroom decisions. </li></ul></ul><ul><ul><li>Districts will be better able to meet NCLB and state standards for having “highly qualified” teachers </li></ul></ul><ul><ul><li>in the classroom. </li></ul></ul><ul><li>2. Integrated & Interoperable Operational Data Store (ODS) </li></ul><ul><ul><li>Provide ability to evaluate student performance within selected programs across various schools </li></ul></ul><ul><ul><li>and districts in the State, and highlight ways to achieve AYP consistently over time. </li></ul></ul><ul><ul><li>Allows for quick turnaround in transferring students records when they move between districts. </li></ul></ul><ul><ul><li>Ease reporting burden on districts, and eliminate redundant and possibly inaccurate reporting of data, and a better foundation for integrated data analysis. </li></ul></ul><ul><li>3. Data Warehouse & Decision Support System /Tools </li></ul><ul><ul><li>A repository for State and district reporting and analysis even at student-level data. </li></ul></ul><ul><ul><li>Allows for more meaningful system-level questions and answers by legislators or policy makers. </li></ul></ul><ul><ul><li>Greater system accessibility to all users with relevant security access privileges will mean greater </li></ul></ul><ul><ul><li>acceptance and use, and ultimately better decision. </li></ul></ul><ul><li>4. System Wide Communication Portal </li></ul><ul><ul><li>Provide a focused location for access to information, analytical/reporting tools, and the necessary </li></ul></ul><ul><ul><li>training and support. Also, maximization of effective system-wide use of state’s data warehouse. </li></ul></ul><ul><ul><li>Rapid and wide dissemination of integrated and proven instructional and administrative practices. </li></ul></ul>
    22. 22. KIDS Phase II Project High-level Project Work Plan, Time-line, & Resource requirements Project Phase: Time-line Resource Requirements <ul><li>Requirements Validation January 10, 2006 Mojo & Gary Scheduled trips to all </li></ul><ul><ul><ul><li> Districts & ESDs </li></ul></ul></ul>3. “ Test Site ” DW/ODS Modeling, June 30, 2006 Database Administrator integration, ETL, Data Quality, Data Modeler/Analysts Meta Data Repository, and Data Quality Analysts Vendor “Bake-off” contracting Business User (Client) Meta Data Administrator ETL & BI developers Data Warehouse Project Manager <ul><li>Inauguration of Governance & January 30, 2006 Doug Kosty & Mojo Nwokoma </li></ul><ul><li>Project Team committee </li></ul><ul><li>members </li></ul>4. OLAP & Portal Development, October 30, 2006 End-user Business Analyst including Training, and Web Developer vendor “Bake-off” Contracting Portal Dashboard developer Business User (Client) BI OLAP Report Developer “ Train-the-trainer” Data Security Officer
    23. 23. BI - OLAP Data Warehouse Architecture
    24. 24. User expectations <ul><li>Expectations must be managed in terms of: </li></ul><ul><ul><li>Schedule </li></ul></ul><ul><ul><li>Budget </li></ul></ul><ul><ul><li>Scope </li></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><ul><li>Availability </li></ul></ul><ul><ul><li>Simplicity (ease of use) </li></ul></ul><ul><ul><li>Tool functionality </li></ul></ul><ul><ul><li>Data cleanliness </li></ul></ul><ul><ul><li>Users’ roles and responsibilities </li></ul></ul>
    25. 25. User responsibilities <ul><li>Be a full-time member of the Core Team </li></ul><ul><li>Participate in all data modeling sessions </li></ul><ul><li>Co-manage the DW project </li></ul><ul><li>Make decisions and escalate disputes to the </li></ul><ul><li>Steering Committee </li></ul><ul><li>Provide meta data for business objects </li></ul><ul><li>Identify data security requirements </li></ul><ul><li>Participate in BI tool selection </li></ul><ul><li>Participate in all review sessions </li></ul><ul><li>Participate in all testing activities </li></ul>
    26. 26. IT Staffing <ul><ul><ul><li>DW roles & responsibilities </li></ul></ul></ul><ul><ul><ul><li>Dedicated IT team </li></ul></ul></ul><ul><ul><ul><li>New skill set – beyond tools, discipline </li></ul></ul></ul><ul><ul><ul><li>Contractors & consultants </li></ul></ul></ul><ul><ul><ul><ul><li>Knowledge transfer </li></ul></ul></ul></ul><ul><ul><ul><li>Training </li></ul></ul></ul><ul><ul><ul><ul><li>Just in time </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Just enough </li></ul></ul></ul></ul>Knowledge transfer through collaboration
    27. 27. Risks to be mitigated <ul><ul><ul><li> Low management commitment </li></ul></ul></ul><ul><ul><ul><li> Low user commitment </li></ul></ul></ul><ul><ul><ul><li> Unrealistic schedule </li></ul></ul></ul><ul><ul><ul><li> Unrealistic user expectations </li></ul></ul></ul><ul><ul><ul><li> Budget too small </li></ul></ul></ul><ul><ul><ul><li> Untrained or unavailable staff </li></ul></ul></ul>
    28. 28. Risks to be mitigated <ul><ul><ul><li> Unclear or changing requirements </li></ul></ul></ul><ul><ul><ul><li> Poor project management </li></ul></ul></ul><ul><ul><ul><li>Creeping scope </li></ul></ul></ul><ul><ul><ul><li>Initial project too large </li></ul></ul></ul><ul><ul><ul><li>Wrong project </li></ul></ul></ul><ul><ul><ul><li> Changing priorities </li></ul></ul></ul><ul><ul><ul><li> Data cleansing not addressed early </li></ul></ul></ul><ul><ul><ul><li> Vendors out of control </li></ul></ul></ul>
    29. 29. Risks to be mitigated <ul><ul><ul><li> Not architected properly (wrong design) </li></ul></ul></ul><ul><ul><ul><li> Inappropriate organization structure </li></ul></ul></ul><ul><ul><ul><li> Lost or changed sponsor </li></ul></ul></ul><ul><ul><ul><li> New technology not understood </li></ul></ul></ul><ul><ul><ul><li> No procedure to resolve disputes </li></ul></ul></ul><ul><ul><ul><li> Exceeding platform capabilities </li></ul></ul></ul>