planning & project management for DWH


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

planning & project management for DWH

  1. 1. Orderly Approach for DWH construction2/23/2012 3.Planning & Project management/D.S.Jagli 1
  2. 2. Topics to be covered 1. How is it different? 2. Life-cycle approach 3. The Development Phases 4. Dimensional Analysis 5. Dimensional Modeling i. Star Schema ii. Snowflake Scheme2/23/2012 3.Planning & Project management/D.S.Jagli 2
  3. 3. 3.Planning & Project management  Reasons for DWH projects failure 1. Improper planning 2. Inadequate project management  Planning for Data ware house is necessary. I. Key issues needs to be planned 1. Value and expectation 2. Risk assessment 3. Top-down or bottom –up 4. Build or Buy 5. Single vender or best of breed II. Business requirement ,not technology III. Top management support IV. Justification2/23/2012 3.Planning & Project management/D.S.Jagli 3
  4. 4. 3.Planning & Project management  Example for DWH Project 1. Introduction Outline for overall plan 2. Mission statement 3. Scope 4. Goals& objectives 5. Key issues & Options 6. Value & expectations 7. Justification 8. Executive sponsorship 9. Implementation Strategy 10. Tentative schedule 11. Project authorization2/23/2012 3.Planning & Project management/D.S.Jagli 4
  5. 5. 3.1 How is it different?  DWH Project Different from OLTP System Project  DWH Distinguish features and Challenges for Project Management 1. Data Acquisition 2. Data Storage 3. Information Delivery2/23/2012 3.Planning & Project management/D.S.Jagli 5
  6. 6. 2/23/2012 3.Planning & Project management/D.S.Jagli 6
  7. 7. 3.2 The life-cycle Approach Fig: DW functional components and SDLC2/23/2012 3.Planning & Project management/D.S.Jagli 7
  8. 8. DWH Project Plan: Sample outline2/23/2012 3.Planning & Project management/D.S.Jagli 8
  9. 9. 3.3 DWHDevelopment Phases2/23/2012 3.Planning & Project management/D.S.Jagli 9
  10. 10. 3.3 DWH Development Phases 1) Project plan 2) Requirements definition 3) Design 4) Construction 5) Deployment 6) Growth and maintenance  Interleaved within the design and construction phases are the three tracks along with the definition of the architecture and the establishment of the infrastructure.2/23/2012 3.Planning & Project management/D.S.Jagli 10
  11. 11. 3.4 Dimensional Analysis  A data warehouse is an information delivery system.  It is not about technology, but about solving users’ problems.  It is providing strategic information to the user.  In the phase of defining requirements, need to concentrate on what information the users need, not on how we are going to provide the required information.2/23/2012 3.Planning & Project management/D.S.Jagli 11
  12. 12. Dimensional Nature of DWH 1. Usage of Information Unpredictable  In providing information about the requirements for an operational system, the users are able to give you precise details of the required functions, information content, and usage patterns. 2. Dimensional Nature of Business Data  Even though the users cannot fully describe what they want in a data warehouse, they can provide you with very important insights into how they think about the business.2/23/2012 3.Planning & Project management/D.S.Jagli 12
  13. 13. Managers think in business dimensions : example2/23/2012 3.Planning & Project management/D.S.Jagli 13
  14. 14. Dimensional Nature of Business Data2/23/2012 3.Planning & Project management/D.S.Jagli 14
  15. 15. Dimensional Nature of Business Data2/23/2012 3.Planning & Project management/D.S.Jagli 15
  16. 16. Examples of Business Dimensions2/23/2012 3.Planning & Project management/D.S.Jagli 16
  17. 17. Examples of Business Dimensions2/23/2012 3.Planning & Project management/D.S.Jagli 17
  18. 18. INFORMATION PACKAGES—A NEWCONCEPT  A novel idea is introduced for determining and recording information requirements for a data warehouse.  This concept helps us to give • A concrete form to the various insights, nebulous thoughts, opinions expressed during the process of collecting requirements.  The information packages, put together while collecting requirements, are very useful for taking the development of the data warehouse to the next phases.2/23/2012 3.Planning & Project management/D.S.Jagli 18
  19. 19. Requirements Not Fully Determinate Information packages enable us to:1. Define the common subject areas2. Design key business metrics3. Decide how data must be presented4. Determine how users will aggregate or roll up5. Decide the data quantity for user analysis or query6. Decide how data will be accessed7. Establish data granularity8. Estimate data warehouse size9. Determine the frequency for data refreshing10. Determine how information must be packaged 2/23/2012 3.Planning & Project management/D.S.Jagli 19
  20. 20. An information package.2/23/2012 3.Planning & Project management/D.S.Jagli 20
  21. 21. Business Dimensions  Business dimensions form the underlying basis of the new methodology for requirements definition.  Data must be stored to provide for the business dimensions.  The business dimensions and their hierarchical levels form the basis for all further phases.2/23/2012 3.Planning & Project management/D.S.Jagli 21
  22. 22. Dimension Hierarchies/Categories Examples:1) Product: Model name, model year, package styling, product line, product category, exterior color, interior color, first model year2) Dealer: Dealer name, city, state, single brand flag, date first operation3) Customer demographics: Age, gender, income range, marital status, household size, vehicles owned, home value, own or rent4) Payment method: Finance type, term in months, interest rate, agent5) Time: Date, month, quarter, year, day of week, day of month, season, holiday flag 2/23/2012 3.Planning & Project management/D.S.Jagli 22
  23. 23. Key Business Metrics or Facts  The numbers , users analyze are the measurements or metrics that measure the success of their departments.  These are the facts that indicate to the users how their departments are doing in fulfilling their departmental objectives.2/23/2012 3.Planning & Project management/D.S.Jagli 23
  24. 24. Example: automobile sales  The set of meaningful and useful metrics for analyzing automobile sales is as follows:  Actual sale price  MSRP sale price  Options price  Full price  Dealer add-ons  Dealer credits  Dealer invoice  Amount of down payment  Manufacturer proceeds  Amount financed2/23/2012 3.Planning & Project management/D.S.Jagli 24
  25. 25. Star Schema Snowflake Scheme2/23/2012 3.Planning & Project management/D.S.Jagli 25
  26. 26. FROM REQUIREMENTS TO DATADESIGN 1. The requirements definition completely drives the data design for the data warehouse. 2. A group of data elements form a data structure. 3. Logical data design includes determination of the various data elements ,structures of data & establishing the relationships among the data structures. 4. The information package diagrams form the basis for the logical data design for the data warehouse. 5. The data design process results in a dimensional data model.2/23/2012 3.Planning & Project management/D.S.Jagli 26
  27. 27. FROM REQUIREMENTS TO DATA DESIGN2/23/2012 3.Planning & Project management/D.S.Jagli 27
  28. 28. Dimensional Modeling Basics: Formation of the automaker salesfact table.2/23/2012 3.Planning & Project management/D.S.Jagli 28
  29. 29. Formation of the automaker dimension tables.2/23/2012 3.Planning & Project management/D.S.Jagli 29
  30. 30. Concept of Keys for Dimension tableSurrogate Keys1. A surrogate key is the primary key for a dimension table and is independent of any keys provided by source data systems.2. Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records.3. Automatically increasing integers make good surrogate keys.4. The original key for each record is carried in the dimension table but is not used as the primary key.5. Surrogate keys provide the means to maintain data warehouse information when dimensions change.30
  31. 31. Concept of Keys for Dimension table Business Keys  Natural keys  Will have a meaning and can be generated out of the data from source system or can be used as is from source system field31
  32. 32. The criteria for combining the tables into adimensional model.1. The model should provide the best data access.2. The whole model must be query-centric.3. It must be optimized for queries and analyses.4. The model must show that the dimension tables interact with the fact table.5. It should also be structured in such a way that every dimension can interact equally with the fact table.6. The model should allow drilling down or rolling up along dimension hierarchies.2/23/2012 3.Planning & Project management/D.S.Jagli 32
  33. 33. The Dimensional model :a STAR schema  With these requirements, we find that a dimensional model with the fact table in the middle and the dimension tables arranged around the fact table satisfies the condition2/23/2012 3.Planning & Project management/D.S.Jagli 33
  34. 34. Case study: STAR schema for automakersales.2/23/2012 3.Planning & Project management/D.S.Jagli 34
  35. 35. E-R Modeling Versus Dimensional Modeling1. OLTP systems capture details of events transactions 1. DW meant to answer questions on overall process2. OLTP systems focus on individual events 2. DW focus is on how managers view the business3. An OLTP system is a window into micro-level transactions 3. DW focus business trends4. Picture at detail level necessary 4. Information is centered around a to run the business business process5. Suitable only for questions at 5. Answers show how the business transaction level measures the process6. Data consistency, non- 6. The measures to be studied in redundancy, and efficient data many ways along several business storage critical dimensions 2/23/2012 3.Planning & Project management/D.S.Jagli 35
  36. 36. E-R Modeling Versus DimensionalModeling Dimensional modeling for the data E-R modeling for OLTP warehouse. systems2/23/2012 3.Planning & Project management/D.S.Jagli 36
  37. 37. 2/23/2012 3.Planning & Project management/D.S.Jagli 37
  38. 38. Star Schemas  Data Modeling Technique to map multidimensional decision support data into a relational database.  Current Relational modeling techniques do not serve the needs of advanced data requirements.  4 Components  Facts  Dimensions  Attributes  Attribute Hierarchies2/23/2012 3.Planning & Project management/D.S.Jagli 38
  39. 39. Facts 1. Numeric measurements (values) that represent a specific business aspect or activity. 2. Stored in a fact table at the center of the star scheme. 3. Contains facts that are linked through their dimensions. 4. Updated periodically with data from operational databases2/23/2012 3.Planning & Project management/D.S.Jagli 39
  40. 40. Dimensions 1. Qualifying characteristics that provide additional perspectives to a given fact  DSS data is almost always viewed in relation to other data 2. Dimensions are normally stored in dimension tables2/23/2012 3.Planning & Project management/D.S.Jagli 40
  41. 41. Attributes 1. Dimension Tables contain Attributes. 2. Attributes are used to search, filter, or classify facts. 3. Dimensions provide descriptive characteristics about the facts through their attributed. 4. Must define common business attributes that will be used to narrow a search, group information, or describe dimensions. (ex.: Time / Location / Product). 5. No mathematical limit to the number of dimensions (3-D makes it easy to model).2/23/2012 3.Planning & Project management/D.S.Jagli 41
  42. 42. Attribute Hierarchies 1. Provides a Top-Down data organization  Aggregation  Drill-down / Roll-Up data analysis 2. Attributes from different dimensions can be grouped to form a hierarchy2/23/2012 3.Planning & Project management/D.S.Jagli 42
  43. 43. Concept of Keys for Star schemaSurrogate Keys The surrogate keys are simply system-generated sequence numbers and is independent of any keys provided by source data systems. They do not have any built-in meanings. Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records; Automatically increasing integers make good surrogate keys. The original key for each record is carried in the dimension table but is not used as the primary key.Business KeysPrimary Keys Each row in a dimension table is identified by a unique value of an attribute designated as the primary key of the dimension.Foreign Keys Each dimension table is in a one-to-many relationship with the central fact table. So the primary key of each dimension table must be a foreign key in the fact table. 43
  44. 44. Star Schema for SalesDimensionTables Fact Table 2/23/2012 3.Planning & Project management/D.S.Jagli 44
  45. 45. Star Schema Representation  Fact and Dimensions are represented by physical tables in the data warehouse database.  Fact tables are related to each dimension table in a Many to One relationship (Primary/Foreign Key Relationships).  Fact Table is related to many dimension tables  The primary key of the fact table is a composite primary key from the dimension tables.  Each fact table is designed to answer a specific DSS question2/23/2012 3.Planning & Project management/D.S.Jagli 45
  46. 46. Star Schema  The fact table is always the larges table in the star schema.  Each dimension record is related to thousand of fact records.  Star Schema facilitated data retrieval functions.  DBMS first searches the Dimension Tables before the larger fact table2/23/2012 3.Planning & Project management/D.S.Jagli 46
  47. 47. Star Schema : advantages 1. Easy to understand 2. Optimizes Navigation 3. Most Suitable for Query Processing2/23/2012 3.Planning & Project management/D.S.Jagli 47
  48. 48. 2/23/2012 3.Planning & Project management/D.S.Jagli 48
  49. 49. THE SNOWFLAKE SCHEMA  Snowflaking” is a method of normalizing the dimension tables in a STAR schema.2/23/2012 3.Planning & Project management/D.S.Jagli 49
  50. 50. Sales: a simple STAR schema.2/23/2012 3.Planning & Project management/D.S.Jagli 50
  51. 51. Product dimension: partially normalized2/23/2012 3.Planning & Project management/D.S.Jagli 51
  52. 52. When to Snowflake  The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.  In a similar manner, some situations provide opportunities to separate out a set of attributes and form a subdimension.2/23/2012 3.Planning & Project management/D.S.Jagli 52
  53. 53. Advantages and Disadvantages  Advantages  Small savings in storage space  Normalized structures are easier to update and maintain  Disadvantages  Schema less intuitive and end-users are put off by the complexity  Ability to browse through the contents difficult  Degraded query performance because of additional joins2/23/2012 3.Planning & Project management/D.S.Jagli 53
  54. 54. ??? Thank you2/23/2012 3.Planning & Project management/D.S.Jagli 54