Warehouse components


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Warehouse components

  1. 1. The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize conform dimensions store awaiting replication export to data marts No user query services Data Staging Area Data Mart #1 OLAP (ROLAP, MOLAP,HOLAP) dimensional access subject oriented user group driven refresh frequency conforms to the Bus Data Mart #2 Data Mart #3 Populate, replicate, recover DW Bus DW Bus Corporate View
  2. 2. The Basic Structure Data Mart #1 OLAP (ROLAP, MOLAP,HOLAP) dimensional access subject oriented user group driven refresh frequency conforms to the Bus Data Mart #2 Data Mart #3 DW Bus DW Bus Corporate Staging Area User Access Ad Hoc Query Tools Reporting Tools and Writers Customized Applications Models: forecasting; scoring; allocating; data mining; scenario analysis; etc. Data Feed Data Feed Data Feed
  3. 3. The Business Dimensional Lifecycle <ul><li>Project Planning Early Critical Tasks </li></ul><ul><ul><li>definition </li></ul></ul><ul><ul><li>scope </li></ul></ul><ul><ul><li>readiness assessment </li></ul></ul><ul><ul><li>business justification </li></ul></ul><ul><li>Remaining Tasks </li></ul><ul><ul><li>Resource requirements and identification </li></ul></ul><ul><ul><li>Schedule construction and integrations </li></ul></ul>
  4. 4. The Business Dimensional Lifecycle <ul><li>Business Requirements Definition </li></ul><ul><ul><li>Critical to success </li></ul></ul><ul><ul><li>Designers must understand the business needs </li></ul></ul><ul><ul><li>A plan to extract users needs and to understand them must be developed. </li></ul></ul>
  5. 5. The Business Dimensional Lifecycle <ul><li>Three project tracks follow the business requirements definition process: </li></ul><ul><ul><li>Data track </li></ul></ul><ul><ul><li>Technology track </li></ul></ul><ul><ul><li>Application track </li></ul></ul>
  6. 6. The Business Dimensional Lifecycle <ul><li>Data Track </li></ul><ul><ul><li>Dimensional modeling </li></ul></ul><ul><ul><li>Physical design </li></ul></ul><ul><ul><li>Data staging design and development </li></ul></ul>
  7. 7. The Business Dimensional Lifecycle <ul><li>Technology Track </li></ul><ul><ul><li>Technical architecture design </li></ul></ul><ul><ul><li>Things to consider: </li></ul></ul><ul><ul><ul><li>business requirements </li></ul></ul></ul><ul><ul><ul><li>current technical environment </li></ul></ul></ul><ul><ul><ul><li>planned strategic technical directions </li></ul></ul></ul>
  8. 8. The Business Dimensional Lifecycle <ul><li>Application Track </li></ul><ul><ul><li>Product identification, selection, and installation </li></ul></ul><ul><ul><li>End user application development </li></ul></ul><ul><ul><ul><li>Configuring the metadata repository access </li></ul></ul></ul><ul><ul><ul><li>Building specialized applications </li></ul></ul></ul>
  9. 9. The Business Dimensional Lifecycle <ul><li>Deployment </li></ul><ul><ul><li>The integration of all the pieces of the puzzle. </li></ul></ul><ul><ul><li>The best warehouse will fail if deployment is not properly planned </li></ul></ul><ul><ul><li>Plan required prior to deployment are: </li></ul></ul><ul><ul><ul><li>education </li></ul></ul></ul><ul><ul><ul><li>user support </li></ul></ul></ul><ul><ul><ul><li>feedback </li></ul></ul></ul><ul><ul><ul><li>enhancement/maintenance </li></ul></ul></ul>
  10. 10. The Business Dimensional Lifecycle <ul><li>Maintenance and Growth </li></ul><ul><ul><li>Work never stops!!! </li></ul></ul><ul><ul><li>Critical to support and stay connected to the users to ensure the warehouse meets their needs. </li></ul></ul><ul><ul><li>Watch performance and plan ahead (the backroom) </li></ul></ul><ul><ul><li>Collect and analyze metrics regarding use and operation </li></ul></ul>
  11. 11. The Business Dimensional Lifecycle <ul><li>Maintenance and Growth (cont) </li></ul><ul><ul><li>If you are successful, change is inevitable. Plan and prioritize future initiatives with user buy-in. </li></ul></ul><ul><ul><li>Always plan for expansion and growth with each new increment or change. </li></ul></ul>
  12. 12. The Business Dimensional Lifecycle <ul><li>Project Management </li></ul><ul><ul><li>Monitor project status </li></ul></ul><ul><ul><li>Track issues </li></ul></ul><ul><ul><li>Control change </li></ul></ul><ul><ul><li>Project communication </li></ul></ul><ul><ul><li>Project marketing </li></ul></ul><ul><ul><li>Project politician </li></ul></ul><ul><ul><li>Project visionary </li></ul></ul>
  13. 13. The Business Dimensional Lifecycle Project Planning Business Requirement Definition Deployment Maintenance and Growth Project Management Dimensional Modeling Physical Design Data Staging Design & Development Technical Architecture Design Product Selection & Installation End-User Application Specification End-User Application Development
  14. 14. Project Planning & Management <ul><li>Who Wants the Warehouse? </li></ul><ul><ul><li>A single visionary user </li></ul></ul><ul><ul><ul><li>desirable because the focus remains manageable </li></ul></ul></ul><ul><ul><ul><li>requires political leverage to make it work </li></ul></ul></ul><ul><ul><ul><li>the need must have broad and definable impacts to show worth </li></ul></ul></ul><ul><ul><li>Multiple demands </li></ul></ul><ul><ul><ul><li>Many organizations want a data mart or warehouse </li></ul></ul></ul><ul><ul><ul><li>Focus is spread, therefore politics and planning play a vital role </li></ul></ul></ul>
  15. 15. Project Planning & Management <ul><li>Who Wants the Warehouse? (cont) </li></ul><ul><ul><li>No identified need </li></ul></ul><ul><ul><ul><li>Organization wanting to get in the “warehouse” game </li></ul></ul></ul><ul><ul><ul><li>More effort on the warehouse team to identify the need </li></ul></ul></ul><ul><ul><ul><li>It is highly likely there will be one. </li></ul></ul></ul>
  16. 16. Project Planning & Management <ul><li>Determine Warehouse Readiness </li></ul><ul><ul><li>Do you have a strong business sponsor? </li></ul></ul><ul><ul><ul><li>Vision </li></ul></ul></ul><ul><ul><ul><li>Politically savvy </li></ul></ul></ul><ul><ul><ul><li>Connected </li></ul></ul></ul><ul><ul><ul><li>Influential </li></ul></ul></ul><ul><ul><ul><li>History of success </li></ul></ul></ul><ul><ul><ul><li>Respected </li></ul></ul></ul><ul><ul><ul><li>Realistic </li></ul></ul></ul><ul><ul><ul><li>Understands the need and the process and can communicate it </li></ul></ul></ul>
  17. 17. Project Planning & Management <ul><li>Determine Warehouse Readiness (cont) </li></ul><ul><ul><li>Without this person you will fail </li></ul></ul><ul><ul><li>Try to recruit multiple sponsors. </li></ul></ul><ul><ul><li>Is there a real and identifiable business need? </li></ul></ul><ul><ul><li>Does a strong partnership exist between IT and the business groups? </li></ul></ul><ul><ul><li>What is the current analytical environment? </li></ul></ul><ul><ul><ul><li>How are things done now? </li></ul></ul></ul><ul><ul><ul><li>What culture shock will be created? </li></ul></ul></ul>
  18. 18. Project Planning & Management <ul><li>Determine Warehouse Readiness (cont) </li></ul><ul><ul><li>What is the feasibility? </li></ul></ul><ul><ul><ul><li>Is the data “dirty” beyond recovery? </li></ul></ul></ul><ul><ul><ul><li>Is the target sources to dispersed and dynamic to achieve early and significant results? </li></ul></ul></ul>
  19. 19. Project Planning & Management <ul><li>Take the Readiness “Litmus Test” </li></ul><ul><ul><li>The test looks at: </li></ul></ul><ul><ul><ul><li>Sponsor </li></ul></ul></ul><ul><ul><ul><li>Business Needs </li></ul></ul></ul><ul><ul><ul><li>IT/Business Partnership </li></ul></ul></ul><ul><ul><ul><li>Current Analytical Environment </li></ul></ul></ul><ul><ul><ul><li>Feasibility </li></ul></ul></ul><ul><ul><li>A strong sponsor is the most important to get a high rating from the test </li></ul></ul><ul><ul><li>Business needs and IT/Business Partnerships are secondary in importance </li></ul></ul>
  20. 20. Project Planning & Management <ul><li>Addressing Readiness Issues </li></ul><ul><ul><li>High-level business requirements analysis </li></ul></ul><ul><ul><ul><li>Identify the strategic initiatives </li></ul></ul></ul><ul><ul><ul><li>Identify the business metrics </li></ul></ul></ul><ul><ul><ul><li>Identify the high impact and ROI areas </li></ul></ul></ul><ul><ul><li>Business Requirements Prioritization </li></ul></ul><ul><ul><ul><li>Look for high impact, ROI, and feasibility </li></ul></ul></ul><ul><ul><li>Proof of Concept </li></ul></ul>
  21. 21. Project Planning & Management <ul><li>Develop the Initial Scope </li></ul><ul><ul><li>Keep the scope narrow and short to retain clarity </li></ul></ul><ul><ul><li>The bigger the scope the more difficult it becomes to retain focus </li></ul></ul><ul><ul><li>Always define the scope based on business requirements. Try to avoid deadlines or budget cycles from driving the scope. </li></ul></ul>
  22. 22. Project Planning & Management <ul><li>Develop the Initial Scope (cont) </li></ul><ul><ul><li>Scope definition involves both IT and business representatives </li></ul></ul><ul><ul><li>Make the scope have significance but ensure it is achievable and timely </li></ul></ul><ul><ul><li>Start with a single or few data sources and a single business process </li></ul></ul><ul><ul><li>Limit your initial user base (typically 25 - 35 people). </li></ul></ul><ul><ul><li>Determine what management expects so success can be identified </li></ul></ul>
  23. 23. Project Planning & Management <ul><li>Develop the Initial Scope (cont) </li></ul><ul><ul><li>Document the scope definition and success indicators </li></ul></ul><ul><ul><li>Acknowledge that the scope will likely change </li></ul></ul><ul><ul><li>Develop a plan to manage the change </li></ul></ul>
  24. 24. Project Planning & Management <ul><li>Build the Business Justification </li></ul><ul><ul><li>Determine the costs </li></ul></ul><ul><ul><ul><li>Identify hardware and software costs (start-up and ongoing) </li></ul></ul></ul><ul><ul><ul><li>Identify maintenance costs </li></ul></ul></ul><ul><ul><ul><li>Internal staff needs </li></ul></ul></ul><ul><ul><ul><li>External resources (consultants, etc.) </li></ul></ul></ul><ul><ul><ul><li>Operational support </li></ul></ul></ul><ul><ul><ul><li>Support of growth pains </li></ul></ul></ul>
  25. 25. Project Planning & Management <ul><li>Build the Business Justification (cont) </li></ul><ul><ul><li>Determine the benefits (financial and other) </li></ul></ul><ul><ul><ul><li>Increased revenue </li></ul></ul></ul><ul><ul><ul><li>Increased profit </li></ul></ul></ul><ul><ul><ul><li>Increased customer satisfaction </li></ul></ul></ul><ul><ul><ul><li>Expansion of a market or capability </li></ul></ul></ul><ul><ul><ul><li>Increased employee productivity </li></ul></ul></ul><ul><ul><ul><li>Reduction of capital investments (storage requirements, etc.) </li></ul></ul></ul><ul><ul><ul><li>Protection against fraud and attack </li></ul></ul></ul>
  26. 26. Project Planning & Management <ul><li>Build the Business Justification (cont) </li></ul><ul><ul><li>It is important to monitor and track the business to identify and market impacts the warehouse has made </li></ul></ul><ul><ul><li>Look for the tangibles and intangibles </li></ul></ul>
  27. 27. Project Planning & Management <ul><li>Plan the Project </li></ul><ul><ul><li>Establish project identity </li></ul></ul><ul><ul><ul><li>Create a name </li></ul></ul></ul><ul><ul><ul><li>Create documentation describing your project </li></ul></ul></ul><ul><ul><ul><li>Make T-shirts, mugs, etc </li></ul></ul></ul><ul><ul><ul><li>Market, market, market!!! </li></ul></ul></ul>
  28. 28. Project Planning & Management <ul><li>Plan the Project (cont) </li></ul><ul><ul><li>Staff up </li></ul></ul><ul><li>Project Manager </li></ul><ul><li>Business Lead </li></ul><ul><li>Business Analyst </li></ul><ul><li>Data Modeler </li></ul><ul><li>DW DBA </li></ul><ul><li>Data Staging System Designer </li></ul><ul><li>End User Application Developer </li></ul><ul><li>DW Educator </li></ul><ul><li>Technical/Security Architect </li></ul><ul><li>Technical Support Specialists </li></ul><ul><li>Data Staging Programmers </li></ul><ul><li>Data Steward </li></ul><ul><li>DW QA Analyst </li></ul>
  29. 29. Project Planning & Management <ul><li>Develop the Project Plan </li></ul><ul><ul><li>Key (frequently update your plan) </li></ul></ul><ul><ul><li>The nature of a DW project in cyclic and resembles a spiral approach </li></ul></ul><ul><ul><li>Identify key milestones </li></ul></ul><ul><ul><li>Develop a high-level and detailed plan </li></ul></ul>
  30. 30. Project Planning & Management <ul><li>Manage the Project </li></ul><ul><ul><li>Matrix management is often used because of the numerous interlaced roles </li></ul></ul><ul><ul><li>Data issues may lay waste to the best devised plans (plan for the unexpected) </li></ul></ul><ul><ul><li>The project will likely increase in visibility (manage expectations) </li></ul></ul><ul><ul><li>Iterative/sliding window development requires multiple teams work in sync (communication) </li></ul></ul>
  31. 31. Project Planning & Management <ul><li>Manage the Project (cont) </li></ul><ul><ul><li>Conduct a project kickoff meeting </li></ul></ul><ul><ul><ul><li>Identify the team, roles, and responsibilities </li></ul></ul></ul><ul><ul><ul><li>Identify the scope </li></ul></ul></ul><ul><ul><ul><li>Identify goals </li></ul></ul></ul><ul><ul><ul><li>Identify the schedule </li></ul></ul></ul><ul><ul><ul><li>Review the preliminary PMP </li></ul></ul></ul><ul><ul><ul><li>Conduct preliminary education </li></ul></ul></ul>
  32. 32. Project Planning & Management <ul><li>Monitor the Project Status </li></ul><ul><ul><li>Frequent communication </li></ul></ul><ul><ul><li>Project status meetings </li></ul></ul><ul><ul><li>Team meetings </li></ul></ul><ul><ul><li>Project status reports </li></ul></ul><ul><ul><li>Customer reporting </li></ul></ul>
  33. 33. Collecting the Requirements <ul><li>The old theory was not to include the users in the early stages. </li></ul><ul><li>Build it and they will come. </li></ul><ul><li>This proved to be the demise of many early warehouse initiatives. </li></ul><ul><li>A formal requirement (but flexible) is needed to document the users needs of the warehouse. </li></ul>
  34. 34. Collecting the Requirements <ul><li>This is a difficult process for many reasons. </li></ul><ul><ul><li>Key people may feel threatened and are not willing to cooperate. </li></ul></ul><ul><ul><li>The informal decision process is typically not well documented and is dispersed. </li></ul></ul><ul><ul><li>People have a difficult time thinking “out of the box” </li></ul></ul><ul><ul><li>Terminology associated to warehousing often creates confusion and/or misinformation </li></ul></ul>
  35. 35. Collecting the Requirements <ul><li>Talk with the business users first </li></ul><ul><ul><li>Strive to understand how they do business </li></ul></ul><ul><ul><li>Identify how decisions are made today </li></ul></ul><ul><ul><li>Determine how they would like to make decisions today and tomorrow </li></ul></ul><ul><ul><li>Do not just ask “what data do you need?” </li></ul></ul>
  36. 36. Collecting the Requirements <ul><li>Talk with the IT community second </li></ul><ul><ul><li>Wait until some common sources and themes are identified by the business users before approaching IT </li></ul></ul><ul><ul><li>Look for feasibility issues </li></ul></ul><ul><ul><li>Start identifying technical issues such as platforms, formats, access, and politics </li></ul></ul><ul><ul><li>Talk DBAs, DAs, application developers, an designers </li></ul></ul>
  37. 37. Collecting the Requirements <ul><li>Getting the requirements (Interview VS Facilitation) </li></ul><ul><ul><li>Interviews tend to stay focused and work well with small groups </li></ul></ul><ul><ul><li>Facilitated sessions work with larger groups and encourage “brainstorming” and cross pollination of ideas. </li></ul></ul>
  38. 38. Collecting the Requirements <ul><li>Roles of the requirements team </li></ul><ul><ul><li>Lead interviewer </li></ul></ul><ul><ul><li>Secondary interviewers </li></ul></ul><ul><ul><li>Scribe </li></ul></ul><ul><ul><li>Observers </li></ul></ul><ul><ul><li>Facilitator </li></ul></ul>
  39. 39. Collecting the Requirements <ul><li>Preparation for the interview </li></ul><ul><ul><li>Look at strategic plans that relate to the company or group you will talk with </li></ul></ul><ul><ul><li>Look at the annual report. Important goals and initiatives will be identified and taken seriously by the company. </li></ul></ul><ul><ul><li>Review marketing material </li></ul></ul><ul><ul><li>Search the Internet for information </li></ul></ul><ul><ul><li>Identify past attempts at similar projects </li></ul></ul>
  40. 40. Collecting the Requirements <ul><li>Identify who will be interviewed </li></ul><ul><ul><li>Business </li></ul></ul><ul><ul><ul><li>Look horizontally across the organization to see the big picture </li></ul></ul></ul><ul><ul><ul><li>Get as much detail as possible in the current area of focus (vertical) </li></ul></ul></ul><ul><ul><ul><li>Request that your sponsor identify who should be interviewed. </li></ul></ul></ul>
  41. 41. Collecting the Requirements <ul><li>Identify who will be interviewed </li></ul><ul><ul><li>Technology </li></ul></ul><ul><ul><ul><li>The data gurus (these people have been around a long time and know the details) </li></ul></ul></ul><ul><ul><ul><li>Application programmers </li></ul></ul></ul><ul><ul><ul><li>Pseudo technical people within a business area </li></ul></ul></ul><ul><ul><ul><li>DBAs </li></ul></ul></ul><ul><ul><ul><li>Data modelers </li></ul></ul></ul><ul><ul><ul><li>System administrators </li></ul></ul></ul><ul><ul><ul><li>IT management to identify the future </li></ul></ul></ul>
  42. 42. Collecting the Requirements <ul><li>Develop an interview questionnaire </li></ul><ul><li>Build an agenda for the interview sessions </li></ul><ul><li>Prepare the interviewees </li></ul><ul><ul><li>Hold a single meeting with all interviewees to discuss the project, intentions, etc </li></ul></ul><ul><ul><li>Set the tone for all interviews </li></ul></ul><ul><ul><li>Encourage questions </li></ul></ul><ul><ul><li>Enables you to identify good and bad candidates early (now you can plan for each person) </li></ul></ul>
  43. 43. Collecting the Requirements <ul><li>Conduct the interview </li></ul><ul><ul><li>Remain within the roles established for the interview team </li></ul></ul><ul><ul><li>Validate what you have collected with the user as soon as possible </li></ul></ul><ul><ul><li>Define terms with the users (profit, revenue, sales) </li></ul></ul><ul><ul><li>Try to talk on their level and avoid using confusing technology terms (use their business lingo when possible) </li></ul></ul>
  44. 44. Collecting the Requirements <ul><li>Conduct the interview </li></ul><ul><ul><li>Try to remain flexible during the interview process </li></ul></ul><ul><ul><ul><li>Meet with unexpected people </li></ul></ul></ul><ul><ul><ul><li>Run past the allotted time </li></ul></ul></ul><ul><ul><ul><li>Discuss topics somewhat out of the focus of the interview. </li></ul></ul></ul><ul><ul><li>Schedule breaks and limit the number of interview session per day to about five </li></ul></ul><ul><ul><li>Continue to manage expectations </li></ul></ul>
  45. 45. Collecting the Requirements <ul><li>Potential interview questions for an executive: </li></ul><ul><ul><li>What are the objectives of your organization? What are you trying to accomplish? </li></ul></ul><ul><ul><li>How do you measure success? How do you know you are doing well? How often do you measure yourself? </li></ul></ul><ul><ul><li>What are the key business issues you face today? What could prevent you from meeting these objectives? What would be the impact? </li></ul></ul>
  46. 46. Collecting the Requirements <ul><li>Potential interview questions for an analyst: </li></ul><ul><ul><li>What are your groups objectives? How do you accomplish them? How do you achieve it? </li></ul></ul><ul><ul><li>What are your success metrics? How do you know you are doing well? How often do you measure? </li></ul></ul><ul><ul><li>What issues do you currently face? </li></ul></ul><ul><ul><li>Describe your products, vendors, etc? Is there a natural hierarchy? </li></ul></ul>
  47. 47. Collecting the Requirements <ul><li>Potential interview questions for an analyst: </li></ul><ul><ul><li>What type of analysis do you perform? What data is used? How do you get it? What do you do with it? </li></ul></ul><ul><ul><li>What analysis would you like to perform? </li></ul></ul><ul><ul><li>What dynamic analysis needs do you have? Who drives these needs? How long does it take to perform? Are you able to conduct deeper levels of analysis? </li></ul></ul><ul><ul><li>What analytical capabilities would you like? </li></ul></ul>
  48. 48. Collecting the Requirements <ul><li>Potential interview questions for an analyst: </li></ul><ul><ul><li>Where are the bottlenecks in obtaining information? </li></ul></ul><ul><ul><li>How much historical information is needed? </li></ul></ul><ul><ul><li>How will improved information access impact you and your organization? What is the financial impact? </li></ul></ul><ul><ul><li>What reports do you currently use? Which data elements on the reports are important? How is this information used? Is it combined with anything else? </li></ul></ul>
  49. 49. Collecting the Requirements <ul><li>What to discuss with IT: </li></ul><ul><ul><li>Request an overview of the operational systems </li></ul></ul><ul><ul><li>What are the current tools and technologies used to share information? </li></ul></ul><ul><ul><li>What types of analyses are performed? </li></ul></ul><ul><ul><li>How are detailed analyses supported and conducted? </li></ul></ul><ul><ul><li>What are the data quality issues? </li></ul></ul><ul><ul><li>Where do bottlenecks exist? </li></ul></ul>
  50. 50. Collecting the Requirements <ul><li>What to discuss with IT: </li></ul><ul><ul><li>What concerns do you have about data warehousing in the organization? What roadblocks do you see? </li></ul></ul><ul><ul><li>What expectations do you have of the warehouse? </li></ul></ul><ul><ul><li>How do you expect the warehouse to impact you? </li></ul></ul>
  51. 51. Collecting the Requirements <ul><li>Types of users you will interview </li></ul><ul><ul><li>Abused User </li></ul></ul><ul><ul><ul><li>Involved in earlier attempts </li></ul></ul></ul><ul><ul><ul><li>Unwilling to cooperate </li></ul></ul></ul><ul><ul><li>Overbooked User (To busy to meet) </li></ul></ul><ul><ul><li>Comatose User </li></ul></ul><ul><ul><li>Overzealous User </li></ul></ul><ul><ul><li>Nonexistent User (Use technology to drive the needs) </li></ul></ul>
  52. 52. Collecting the Requirements <ul><li>Wrap Up </li></ul><ul><li>Review the interview results with the team </li></ul><ul><li>Prepare and publish the results </li></ul><ul><li>Establish what will be done next </li></ul>
  53. 53. Dimensional Modeling Jeffrey T. Edgell
  54. 54. The Dimensional Model <ul><li>More intuitive structure for presentation and reporting </li></ul><ul><li>Likely predates the E/R approach </li></ul><ul><ul><li>General Mills & Dartmouth University developed a fact and dimension structure </li></ul></ul><ul><ul><li>Nielsen Marketing Research used this on grocery and drug store auditing and scanner data in the 70s and 80s. </li></ul></ul>
  55. 55. The Dimensional Model <ul><li>Dimensions are descriptive </li></ul><ul><li>Facts are likely numeric and are measurement based </li></ul><ul><li>Additive facts are vital to allow aggregation of many records during a retrieval </li></ul><ul><li>Page 145 (A typical dimensional model) </li></ul>
  56. 56. The Argument for the Dimensional Model <ul><li>Tools can utilize a standardized framework </li></ul><ul><li>Query tools can leverage against this for performance optimization </li></ul><ul><li>High performance entry browsing is possible </li></ul><ul><li>All queries can be initially constrained thus significantly increasing performance </li></ul>
  57. 57. The Argument for the Dimensional Model <ul><li>Easily adapts to unpredictable queries </li></ul><ul><li>Extends to allow the addition of new tables or data elements </li></ul><ul><ul><li>will not require rebuilding the database from scratch </li></ul></ul><ul><ul><li>data does not need to be reloaded </li></ul></ul><ul><ul><li>existing reports and query tools do not need to be redesigned or implemented </li></ul></ul>
  58. 58. The Argument for the Dimensional Model <ul><li>The model can be altered as follows without interruption: </li></ul><ul><ul><li>The addition of new facts (consistent with the defined grain) </li></ul></ul><ul><ul><li>The addition of new dimensions </li></ul></ul><ul><ul><li>The widening of a dimension table </li></ul></ul><ul><ul><li>Changing the detail of a dimension to a lower level </li></ul></ul>
  59. 59. The Argument for the Dimensional Model <ul><li>The dimensional model exhibits a predefined set of approaches used to deal with common issues. </li></ul><ul><ul><li>Slowly changing dimensions </li></ul></ul><ul><ul><li>Heterogeneous products (track different lines of business i.e. checking & savings) </li></ul></ul><ul><ul><li>Pay-in-advance data bases (look at individual components as well as the total) </li></ul></ul><ul><ul><li>Event handling (no facts) </li></ul></ul>
  60. 60. The Argument for the Dimensional Model <ul><li>Aggregation in a warehouse allows for query performance normally delegated to hardware to solve (greatly increasing $) </li></ul><ul><li>A standard set of schemas for different business types and applications exist </li></ul>
  61. 61. The Bus <ul><li>Supports the incremental approach </li></ul><ul><li>The data mart approach has often lead to development of warehouse absent of a corporate framework </li></ul><ul><li>Stovepipe decision structures result </li></ul><ul><li>Produces a uniform global structure eliminating the pocket or stovepipe data marts </li></ul>
  62. 62. The Bus <ul><li>Look at the entire enterprise as you design and build the data marts </li></ul><ul><li>A high level architecture must be defined that explains the entire structure </li></ul><ul><li>A detailed architecture must be developed to support each data mart as they are confronted </li></ul>
  63. 63. Conformed Dimensions <ul><li>Dimensions used to represent concepts across the enterprise must be standardized and agreed upon </li></ul><ul><ul><li>customer </li></ul></ul><ul><ul><li>product </li></ul></ul><ul><ul><li>time </li></ul></ul><ul><ul><li>potentially not region (sales & management) </li></ul></ul>
  64. 64. Conformed Dimensions <ul><li>Conformed dimensions must be carefully managed, maintained, and published to ensure consistency </li></ul><ul><li>The conformed dimension represents the central source description of which everyone agrees </li></ul><ul><li>If the conformed dimension approach is not observed, the bus will not properly function </li></ul>
  65. 65. Conformed Dimensions <ul><li>With conformed dimensions </li></ul><ul><ul><li>One dimension table relates to multiple facts </li></ul></ul><ul><ul><li>Browsers are consistent with the dimension providing a unified view </li></ul></ul><ul><ul><li>Rollups and meanings remain consistent across facts </li></ul></ul>
  66. 66. Conformed Dimensions <ul><li>Design </li></ul><ul><ul><li>Lowest level of granularity possible (based on the lowest level defined) </li></ul></ul><ul><ul><li>Use the sequential numeric key (surrogate key) </li></ul></ul>
  67. 67. Conformed Facts <ul><li>Occurs during the definition of conformed dimensions </li></ul><ul><li>Relates common measurements accurately </li></ul><ul><ul><li>Cost </li></ul></ul><ul><ul><li>Profit </li></ul></ul><ul><ul><li>Unit price </li></ul></ul><ul><li>If facts are different use different names (marketing profit & sales profit) </li></ul><ul><li>As much political as technical </li></ul>
  68. 68. When the Bus is not Required <ul><li>The business you are dealing with is intentionally segmented </li></ul><ul><ul><li>Components operated autonomously with no unified corporate view required </li></ul></ul><ul><ul><li>Products or business areas are disjoint </li></ul></ul><ul><ul><li>For example a company sells music and repairs train engines (no business or product synergy except at the very top) </li></ul></ul>
  69. 69. The Components of the Dimensional Model <ul><li>Facts </li></ul><ul><li>Dimensions </li></ul><ul><li>Attributes </li></ul><ul><li>The Bus (optional but highly suggested) </li></ul>
  70. 70. Operations <ul><li>Drill down and rollup </li></ul><ul><ul><li>Example on page 168 </li></ul></ul>
  71. 71. Snowflakes <ul><li>What is it? </li></ul><ul><ul><li>The removal of low cardinality fields from a dimension placed in a new table and linked back with keys </li></ul></ul><ul><li>Complicates design detail </li></ul><ul><li>Decreases performance </li></ul><ul><li>Saves some space but normally not a significant amount </li></ul><ul><li>Bit map indexes can not be effectively utilized </li></ul>
  72. 72. When a Snowflake is OK <ul><li>When used as a subdimesnion </li></ul><ul><ul><li>The data in the subd is related to the dimension are at different levels of granularity </li></ul></ul><ul><ul><li>The data load times for the data are different </li></ul></ul><ul><ul><li>Examples: </li></ul></ul><ul><ul><ul><li>County and state </li></ul></ul></ul><ul><ul><ul><li>District and region </li></ul></ul></ul><ul><ul><ul><li>Ship and battle group </li></ul></ul></ul>
  73. 73. Good Descriptive Dimensions <ul><li>Large dimension tables </li></ul><ul><li>Highly descriptive </li></ul><ul><li>Without good descriptive dimensions, the warehouse is not useful </li></ul><ul><li>Use: </li></ul><ul><ul><li>full words, no missing values (null), QA, metadata </li></ul></ul>
  74. 74. Common Dimension Techniques <ul><li>Time </li></ul><ul><ul><li>example figure 5.7 page 176 </li></ul></ul><ul><li>Address </li></ul><ul><ul><li>example page 178 </li></ul></ul><ul><li>Commercial address </li></ul><ul><ul><li>example page 179 </li></ul></ul>
  75. 75. Slowly Changing Dimensions <ul><li>What to do: </li></ul><ul><ul><li>Type 0: Ignore the change </li></ul></ul><ul><ul><li>Type 1: Overwrite the changed attribute </li></ul></ul><ul><ul><li>Type 2: Add a new dimension record with new value of the surrogate key </li></ul></ul><ul><ul><li>Type 3: Add an “old value” field </li></ul></ul>
  76. 76. Slowly Changing Dimensions <ul><li>Ignore the change </li></ul><ul><ul><li>Not typically a good solution to the problem, but is done. </li></ul></ul><ul><li>Overwrite the changed attribute </li></ul><ul><ul><li>Valid when correcting a value from the source </li></ul></ul><ul><li>Add a new dimension record with a generalized key </li></ul><ul><ul><li>Retains history of a changed product </li></ul></ul>
  77. 77. Slowly Changing Dimensions <ul><li>Add an “old value” field </li></ul><ul><ul><li>Valid when on the previous change is needed for decision making </li></ul></ul>
  78. 78. Slowly Changing Dimensions <ul><li>Type 2 example: </li></ul><ul><li>Change in product (bottle changes from platic to glass) </li></ul>Key 001 002 Type Plastic Glass SKU 1234 1234
  79. 79. Slowly Changing Dimensions <ul><li>Type 3 example: </li></ul><ul><li>Regional divisions of a company changes (only one historical change is supported) </li></ul>Region Gold Silver Platinum Bronze Old Region North South East West
  80. 80. The Monster Dimension <ul><li>It is a compromise </li></ul><ul><li>Avoids creating copies of dimension records in a significantly large dimension </li></ul><ul><li>Done to manage space and changes efficiently </li></ul>
  81. 81. The Monster Dimension Customer_Key name address city, state birth_date date_first_purchase income number_children education total_purchases credit_score Customer_Key name address city,state birth_date date_first_purchase Demographics_Key income_band number_children education_level total_purchases_band credit_group Basically constant May change with each purchase Bands used to minimize possibilities Example 1
  82. 82. The Monster Dimension <ul><li>Case 1 (Rapid change) </li></ul><ul><ul><li>Large dimensions can be dynamic because of the amount of information contained </li></ul></ul><ul><ul><li>Certain aspects must be maintained in the dimension, over time, to understand impacts </li></ul></ul><ul><ul><ul><li>demographics </li></ul></ul></ul><ul><ul><ul><li>customer data </li></ul></ul></ul><ul><ul><ul><li>product lines (for companies in acquisition) </li></ul></ul></ul>
  83. 83. The Monster Dimension <ul><li>The solution to very dynamic large dimensions </li></ul><ul><ul><li>identify the dynamic areas of the dimension </li></ul></ul><ul><ul><li>segment the hot areas into there own independent dimensions </li></ul></ul><ul><ul><li>The relative static information remains in the original dimension </li></ul></ul>
  84. 84. The Monster Dimension <ul><li>The trade off (plus) </li></ul><ul><ul><li>the warehouse can accurately retain significant changes in a dimension over time </li></ul></ul><ul><ul><li>to slow the rate of change down extremely dynamic attributes should be banded to slow the rate of change </li></ul></ul><ul><ul><li>All possible combinations in the dimension become finite (discrete) and are thus manageable </li></ul></ul>
  85. 85. The Monster Dimension <ul><li>The trade off (minus) </li></ul><ul><ul><li>Loss of detail in the bands (no longer exact) </li></ul></ul><ul><ul><li>Once bands are defined they must be enforced from that point on </li></ul></ul><ul><ul><li>Slower browse performance required when combining the segmented table with the original table </li></ul></ul><ul><ul><li>Impossible to combine the data without a single instance of a fact (nothing to relate the dimensions) </li></ul></ul>
  86. 86. The Monster Dimension Employee Table name address date_of_birth social_security_num … title years_with_company income division purchase_level Example 2 Employee Table name address date_of_birth social_security_num Corporate Demographics position_grade income_band division service_years_band
  87. 87. Degenerate Dimensions/Key <ul><li>Definition - Critical data provided in the legacy environment that normally remains independent. Typically the old key from the current fact information you are using with no supporting data </li></ul>
  88. 88. Degenerate Dimensions/Key <ul><li>Likely found in the header of a file </li></ul><ul><li>The other items have been absorbed in other dimensions </li></ul><ul><ul><li>customer, date, vendor, item </li></ul></ul><ul><li>The remaining item has no supporting attributes but is important </li></ul><ul><ul><li>CLIN, Requisition #, Order # </li></ul></ul><ul><li>Useful information and should be absorbed in the fact table </li></ul>
  89. 89. Degenerate Dimensions/Key <ul><li>Useful information and should be absorbed in the fact table </li></ul><ul><li>If there is other supporting attributes, it becomes a typical dimension </li></ul>
  90. 90. Junk Dimensions <ul><li>Resident flags, status codes, and miscellaneous information persists after the dimensional design is near complete </li></ul><ul><li>Alternatives: </li></ul><ul><ul><li>Place the flags in the fact tables </li></ul></ul><ul><ul><li>Make each attribute a dimension </li></ul></ul><ul><ul><li>Remove the attributes completely </li></ul></ul>
  91. 91. Junk Dimensions <ul><li>Leave the flags in the fact tables </li></ul><ul><ul><li>likely sparse data </li></ul></ul><ul><ul><li>no real browse entry capability </li></ul></ul><ul><ul><li>can significantly increase the size of the fact table </li></ul></ul><ul><li>Remove the attributes from the design </li></ul><ul><ul><li>potentially critical information will be lost </li></ul></ul><ul><ul><li>if they provide no relevance, remove them </li></ul></ul>
  92. 92. Junk Dimensions <ul><li>Make a flag into it’s own dimension </li></ul><ul><ul><li>may greatly increase the number of dimensions, increasing the size of the fact table </li></ul></ul><ul><ul><li>can clutter and confuse the design </li></ul></ul><ul><li>Combine all relevant flags, etc. into a single dimension </li></ul><ul><ul><li>the number of possibilities remain finite </li></ul></ul><ul><ul><li>information is retained </li></ul></ul>
  93. 93. Keys, Keys, Keys <ul><li>Surrogate keys (always use) </li></ul><ul><ul><li>4 byte integer (2 32 or two billion + integers) </li></ul></ul><ul><li>Date keys should use surrogates as well </li></ul><ul><ul><li>dates are typically 8 bytes -- saves 4 bytes per fact </li></ul></ul><ul><li>Do not use smart keys with embedded meanings </li></ul><ul><li>Do not use legacy or production keys </li></ul>
  94. 94. Just the Facts <ul><li>Attempt to make all facts additive </li></ul><ul><ul><li>simplifies calculations across dimensions </li></ul></ul><ul><ul><li>all numbers are not additive facts </li></ul></ul><ul><li>Semi-additive facts can be used but understand they are there </li></ul><ul><ul><li>averages, max, min </li></ul></ul><ul><li>Non-additive facts often are avoided but may have value </li></ul><ul><ul><li>weather conditions (non-discrete), non-discrete discriptions </li></ul></ul>
  95. 95. Steps to Designing a Fact Table <ul><li>Time to choose: </li></ul><ul><ul><li>data mart (functional business area) </li></ul></ul><ul><ul><li>grain of the fact table (what level of detail) </li></ul></ul><ul><ul><li>dimensions associated to the data mart </li></ul></ul><ul><ul><li>the facts relative to the data mart </li></ul></ul>
  96. 96. Data Mart <ul><li>Single operational source data marts provide the least amount of risk </li></ul><ul><li>Multiple operational source data marts typical provide more cross functional value </li></ul><ul><li>Examples: (remember, processes you measure) </li></ul><ul><ul><li>Marketing </li></ul></ul><ul><ul><li>Sales </li></ul></ul><ul><ul><li>Inventory </li></ul></ul><ul><ul><li>Productivity </li></ul></ul>
  97. 97. Fact Table Grain <ul><li>Without this, dimensions can not be accurately defined </li></ul><ul><li>Select as low of a grain as possible </li></ul><ul><ul><li>handles unexpected queries </li></ul></ul><ul><ul><li>adapts readily to additional facts and dimensions </li></ul></ul><ul><ul><li>delivers the most comprehensive solution </li></ul></ul><ul><ul><li>Consumes more space </li></ul></ul><ul><ul><li>Performance can be an issue </li></ul></ul>
  98. 98. Fact Loads <ul><li>By record </li></ul><ul><ul><li>account for every transaction or activity recorded (ATM) </li></ul></ul><ul><li>Snapshot </li></ul><ul><ul><li>A picture of the related facts at a specific point in time (monthly reporting) </li></ul></ul><ul><li>Line item </li></ul><ul><ul><li>track and reflect the status of line item activity (PO) </li></ul></ul>
  99. 99. Dimensions <ul><li>Once the grain is defined, basic dimensions will be evident from the grain (customer, time, etc.) </li></ul><ul><li>Addition of other dimensions and junk dimensions </li></ul><ul><li>All dimensions can not be at a lower level of granularity than the lowest fact table grain </li></ul>
  100. 100. Identifying Facts <ul><li>The grain of the fact table dictates the facts </li></ul><ul><li>All facts must be at the same level </li></ul><ul><li>Individual transaction tables typically have 1 fact (the numeric value of the transaction) </li></ul><ul><li>Snap shot and line item fact tables will likely contain multiple facts in that multiple additive facts are captured </li></ul><ul><li>Keep all three types separated </li></ul>
  101. 101. Fact Table Families <ul><li>Process chain (supply chain, linear) </li></ul><ul><ul><li>fact table represents each step in the process </li></ul></ul><ul><ul><li>RFI-RFP-RFQ-Contact-Delivery </li></ul></ul><ul><ul><li>supply chain process example: page 200 </li></ul></ul><ul><ul><li>each fact is connected on the bus </li></ul></ul><ul><li>Value Circle (parallel measurement) </li></ul><ul><ul><li>health care (example page 202) </li></ul></ul><ul><ul><li>retail </li></ul></ul>
  102. 102. Fact Table Families <ul><li>Heterogeneous Product Schemas </li></ul><ul><ul><li>Service offered by the business are distinct and separate </li></ul></ul><ul><ul><li>banking (checking, savings, loans, etc.) </li></ul></ul><ul><ul><li>Insurance (life, home, auto, etc.) </li></ul></ul><ul><li>Transaction an Snapshot Schemas </li></ul><ul><ul><li>Snapshot (periodic picture) example page 210 </li></ul></ul><ul><ul><li>Transaction (activity detail) example page 207 </li></ul></ul>
  103. 103. Aggregate Families <ul><li>Used to improve query performance </li></ul><ul><li>Typically roll ups of facts along a dimension for anticipated reporting and querying </li></ul><ul><li>Aggregate tables can also be used to combine details from two fact tables of varying granularity </li></ul>
  104. 104. Factless Fact Tables <ul><li>Used for two reasons: </li></ul><ul><ul><li>record an activity (student attendance page 213) </li></ul></ul><ul><ul><ul><li>answers what the most popular classes were </li></ul></ul></ul><ul><ul><ul><li>what days are frequently missed </li></ul></ul></ul><ul><ul><li>Coverage (account for activity that may not have happened) (example page 215) </li></ul></ul><ul><ul><ul><li>An entry is placed in the fact table for all item of interest </li></ul></ul></ul><ul><ul><ul><li>answers questions regarding what did and did not have activity </li></ul></ul></ul>