Data Warehouse Presentation


Published on

Data Warehousing, what is it and what are the requirements?

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • What are some business facts that you need, or would like, to be able to report on? At the end of this presentation we will list some of these.
  • The data warehouse is to help you answer business questions, questions like: [Slide] And, to help you answer questions like these we will providing you with what are called Reporting Cubes.
  • Here is an example of how to identify Facts and Dimensions on an existing report The Facts are Count of Cases, Sum of Aid Payments, Average of Pay per Case The Dimensions are By Program, By Aid Type, By Calendar Month, For Fiscal Year
  • Here we see another report. Again the fact is a count of Clients, the dimensions are By Race group, By Department, For Active Year
  • These cubes can provide for drilling down into greater level of detail From the previous report we have “drilled” into the Social Services Division, “down” to the program level. Can you tell me what the dimensions are here? By Race By Department By Program By Active Year
  • We say that these cubes are multi-dimensional. This report shows that we can combine dimensions to find even more interest information. Notice that the Fact is Unique Client , the Dimensions are By Race Group , By Gender , By Marital Status , By Department , and the Filter, or Selection, is For Active Year
  • These reports are easily converted in to visual graphs. Here we see the prior grid report in a graph format. This allows us to quickly notice interesting information.
  • And the way to ensure acceptance is to ensure we have YOUR requirements, so that it meets your needs. We are now going get your requirements. We are going identify the facts (numbers) you need, and how you would like them grouped by.
  • Data Warehouse Presentation

    1. 1. Data Warehousing What is it and what are the requirements?
    2. 2. What is a Data Warehouse? <ul><li>The conglomeration of an organization’s data warehouse staging and presentation areas, where operational data is specifically structured for query and analysis performance and ease-of-use. </li></ul><ul><li>Ralph Kimball, (2002) The Data Warehouse Toolkit. </li></ul>
    3. 3. Now in English <ul><li>A data warehouse is a database organized in a way to allow for fast queries of information. </li></ul><ul><li>It contains data from a variety of different database systems. </li></ul>
    4. 4. Measures “Facts” not Activities <ul><li>Facts are business performance measurements </li></ul><ul><ul><li>Meals provided </li></ul></ul><ul><ul><li>Dollars expended </li></ul></ul><ul><ul><li>Hours worked </li></ul></ul><ul><li>Facts are numerical and additive </li></ul><ul><ul><li>Sum of dollars spent </li></ul></ul><ul><ul><li>Count of clients served </li></ul></ul><ul><li>Facts are stored to represent a measurement at a particular “ grain ” </li></ul>
    5. 5. What is a Grain? <ul><li>A grain is the level of detail at which a business measurement is stored </li></ul><ul><li>Different businesses have different Fact needs </li></ul><ul><ul><li>A Social Services grain </li></ul></ul><ul><ul><ul><li>The number of food stamp dollars given to a case each month </li></ul></ul></ul><ul><ul><li>In-Home Support Services grain </li></ul></ul><ul><ul><ul><li>The number of hours of service a client received in a provider’s pay period </li></ul></ul></ul><ul><ul><ul><li>The number of dollars paid to a provider for a client during a pay period </li></ul></ul></ul>
    6. 6. What is a Dimension? <ul><li>A Dimension is a textual description that groups and describes a fact, for example: </li></ul><ul><ul><li>Ethnicity (White, Black, Japanese) </li></ul></ul><ul><ul><li>Language (English, Spanish, Tagalog) </li></ul></ul><ul><ul><li>Gender (Male, Female) </li></ul></ul><ul><ul><li>Country (USA, Mexico, Canada) </li></ul></ul><ul><ul><ul><li>State (California, Arizona, New Mexico) </li></ul></ul></ul>
    7. 7. Used in Queries <ul><li>Dimensions are used to restrict and frame queries on Facts, for example: </li></ul><ul><ul><li>“ Give me a count of all Spanish speaking white males in California” </li></ul></ul><ul><li>The Fact </li></ul><ul><ul><li>Count of (a number) </li></ul></ul><ul><li>The Dimensions are: </li></ul><ul><ul><li>By Language (Spanish), </li></ul></ul><ul><ul><li>By Race (white), </li></ul></ul><ul><ul><li>By Gender (male), </li></ul></ul><ul><ul><li>By Location (California) </li></ul></ul>
    8. 8. Answers Business Questions <ul><li>How many Spanish speaking clients did we serve in each department for each of the past 3 years? </li></ul><ul><li>Which cities currently have the highest concentration of Asian clients? What has the trend been? </li></ul><ul><li>How many people who receive Medi-Cal received a service in 2003 from health services, by service? </li></ul>
    9. 9. Finding Facts and Dimensions
    10. 10. Documenting Facts and Dimensions <ul><li>Dimension Example </li></ul><ul><li>Fact Example </li></ul>
    11. 11. How do we begin? <ul><li>E xtract </li></ul><ul><ul><li>Identify the sources of data </li></ul></ul><ul><li>T ransform </li></ul><ul><ul><li>Identify the standards and rules </li></ul></ul><ul><li>L oad </li></ul><ul><ul><li>Make it available </li></ul></ul>E T L
    12. 12. Identify with a Bus Matrix <ul><li>A matrix which is used to identify and document the intersections between business process (systems) and common dimensions (attributes) </li></ul><ul><li>Each business process represents a row in the matrix and each dimension a column </li></ul>
    13. 13. Building a Bus Matrix
    14. 14. Code Translations <ul><li>Values from source systems are standardized </li></ul><ul><li>One value from a source system can become one or more values in the warehouse </li></ul><ul><li>Translation documents contain translation methodology and values. </li></ul>
    15. 15. Objective of Code Standards <ul><li>To provide a consistent and common way to express dimensions for comparisons against well known and established values. </li></ul><ul><ul><li>United States Census Bureau </li></ul></ul><ul><ul><li>United States Postal Service </li></ul></ul><ul><ul><li>International Standard Organization </li></ul></ul>
    16. 16. Demographic Values <ul><li>Based on DP-1. Profile of General Demographic Characteristics </li></ul>
    17. 17. Code Driven Data <ul><li>Ethnicity/Race </li></ul><ul><li>Hispanic Origin </li></ul><ul><li>Country </li></ul><ul><li>Language </li></ul><ul><li>State </li></ul><ul><li>Disability </li></ul><ul><li>Marital Status </li></ul>
    18. 18. Ethnicity/Race <ul><li>The system will use the ethnicity codes as defined by the US Census Bureau. </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>These categories are socio-political constructs and should not be interpreted as being scientific or anthropological in nature. Furthermore, the race categories include both racial and national origin groups. </li></ul></ul>
    19. 19. Ethnicity/Race The OMB race categories The Census question (600) Some Other Race (300) American Indian or Alaska Native (500) Native Hawaiian or Other Pacific Islander (200) Black or African American (400) Asian (100) White
    20. 20. Hispanic Origin <ul><li>The Federal government considers race and Hispanic origin to be two separate and distinct concepts. </li></ul>Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the United States. Persons of Hispanic origin may be of any race.
    21. 21. Country and Language Values <ul><li>Country code values come from the International Standards Organization (ISO) </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>3 character coding system </li></ul></ul><ul><ul><li>Plus general codes for non-ISO codes (Asia General) </li></ul></ul><ul><li>Language code values come from the ISO </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>System’s values will be mapped appropriately </li></ul></ul>
    22. 22. US States Values <ul><li>State code values come from the </li></ul><ul><li>United States Postal Service </li></ul><ul><ul><li> </li></ul></ul>
    23. 23. Disability Values <ul><li>Many times there are no “standards” for setting up values, for example, disability </li></ul><ul><li>Best approach is to adopt values used by client’s existing systems. </li></ul>
    24. 24. Marital Status Values <ul><li>This is the code defining the marital status of a person. </li></ul><ul><li>The US Census has five classifications </li></ul><ul><ul><ul><li>Now Married, Widowed, Divorced Separated, Never Married. </li></ul></ul></ul><ul><li>The systems used contains a greater number of Classifications. </li></ul><ul><li>There are no “standard” formats </li></ul>
    25. 25. Marital Status Values <ul><li>A superset of marital classifications based on the systems can be used. </li></ul>
    26. 26. Translation Rules <ul><li>Invalid Values </li></ul><ul><li>Values that change </li></ul><ul><li>Values that are cumulative </li></ul><ul><li>Dependent Values </li></ul>
    27. 27. Invalid Values Rules <ul><li>Invalid values will be replaced </li></ul><ul><ul><li>If no value present an invalid one can be used. </li></ul></ul><ul><ul><ul><li>SSN: 000-00-1234  Nothing </li></ul></ul></ul><ul><ul><li>A valid new value will replace an invalid. </li></ul></ul><ul><ul><ul><li>SSN: 987-12-4321  000-00-1234 </li></ul></ul></ul><ul><ul><li>An invalid value can not replace a valid one. </li></ul></ul><ul><ul><ul><li>SSN: 000-00-1234 –x 987-12-3421 </li></ul></ul></ul>
    28. 28. Values that Change Rules <ul><li>Different systems have different depths of data </li></ul><ul><li>Specific values should override generalized values </li></ul><ul><li>Specific values are considered equals </li></ul>
    29. 29. Values that Change Rules <ul><li>GENDER </li></ul><ul><ul><li>Male/Female  Unknown </li></ul></ul><ul><ul><li>Unknown –x Male/Female </li></ul></ul><ul><ul><li>Male  Female </li></ul></ul><ul><li>COUNTRY Of BIRTH </li></ul><ul><ul><li>Germany (deu)  Europe (eux) </li></ul></ul><ul><ul><li>Europe (eux) –x Germany (deu) </li></ul></ul><ul><ul><li>Mexico (mex)  United States (usa) </li></ul></ul>
    30. 30. Values that Change Rules <ul><li>HISPANIC ORIGIN </li></ul><ul><ul><li>Mexican  Other Hispanic </li></ul></ul><ul><ul><li>Other Hispanic –x Mexican </li></ul></ul><ul><ul><li>Mexican  Not Hispanic Origin </li></ul></ul><ul><ul><li>Not Hispanic  Mexican/Other Hispanic </li></ul></ul><ul><li>MARITAL STATUS </li></ul><ul><li>EDUCATION / EDU DEGREE </li></ul><ul><li>PREFERRED LANGUAGE </li></ul><ul><li>STATE OF BIRTH </li></ul>
    31. 31. Values that are Cumulative <ul><li>RACE </li></ul><ul><ul><li>Vietnamese (450)  Amerasian (649) </li></ul></ul><ul><ul><li>Amerasian (649) –x Vietnamese (450) </li></ul></ul><ul><ul><li>White (100) AND Vietnamese (450) </li></ul></ul><ul><ul><ul><li>White/Vietnamese </li></ul></ul></ul><ul><li>DISABILITIES </li></ul><ul><ul><li>Blind  Unknown </li></ul></ul><ul><ul><li>Unknown –x Blind </li></ul></ul><ul><ul><li>Blind AND Deaf </li></ul></ul><ul><ul><ul><li>( Blind/Deaf ) </li></ul></ul></ul>
    32. 32. Dependent Values <ul><li>State Of Birth </li></ul><ul><ul><li>If country of birth is not USA then state of birth must be ‘unknown‘ </li></ul></ul><ul><li>Education and Education Degree </li></ul><ul><ul><li>If a Education Degree is set then an Education must be at least to a corresponding level </li></ul></ul><ul><ul><ul><li>High School degree  Education level 12 </li></ul></ul></ul><ul><ul><ul><li>Bachelors Degree  Education level 16 </li></ul></ul></ul>
    33. 33. Example <ul><li>  What is the racial makeup of clients that receive services from the Division of Mental Health and how does this compare in relation to the total County makeup? </li></ul>
    34. 34. Valid Comparisons
    35. 35. Reporting Cubes Using Pivot Tables help you in reporting and analyzing data In this report, the Fact is Client count, the Dimensions are By Department, By Race, and By Active Year
    36. 36. Drill Down Capable From the previous report we have “drilled” into the Social Services Division, “down” to the program level. Cubes provide for drilling down into greater level of detail
    37. 37. Multi-Dimensional We say that these cubes are multi-dimensional. This report shows that we can combine dimensions to find even more interest information. Notice that the Fact is Unique Client , the Dimensions are By Race Group , By Gender , By Marital Status , By Department , and the Filter, or Selection, is For Active Year
    38. 38. Visual Graphs Reports are easily converted in to visual graphs. Here is the prior grid report in a graph format. “ Interesting” information is quickly noticed.
    39. 39. Putting it all together <ul><li>Choose the systems to include </li></ul><ul><li>Identify the exact grain of the business process </li></ul><ul><li>Identify the dimensions available for use with each fact table row </li></ul><ul><li>Choose the numeric facts of what is being measured </li></ul><ul><li>Define the code translation standards </li></ul><ul><li>Establish the translation rules </li></ul>
    40. 40. What is a Data Warehouse?
    41. 41. Key to Success <ul><li>To ensure success, end user involvement is required : </li></ul><ul><li>Data warehouse success is tied directly to user acceptance. If the users haven’t accepted the data warehouse …then your efforts have been exercises in futility . (Kimball, 2002) </li></ul>