The document discusses key concepts in data warehousing including:
- A data warehouse contains aggregated data from multiple sources organized for fast querying and analysis. It allows users to analyze data through multi-dimensional reporting and drill downs.
- Facts are numeric measurements like dollars spent or clients served, while dimensions provide context like location, date, or client characteristics. Facts and dimensions together answer business questions.
- To design an effective data warehouse, the business grain, key facts and dimensions must be identified based on user needs through end user involvement.
2. What is a Data WarehouseWhat is a Data Warehouse?
The conglomeration of an organization’s data
warehouse staging and presentation areas,
where operational data is specifically
structured for query and analysis
performance and ease-of-use.
Ralph Kimball,(2002) The Data Warehouse Toolkit.
3. Now in EnglishNow in English
A data warehouse is a database organized in a
way to allow for fast queries of information.
It contains the data from the different database
systems that is brought together for a single
view.
4. So what’s the differenceSo what’s the difference?
Transactional Sources
• Centers around
transactions
• 2 dimension reports
– Age by System
• Individual data
• Slow
• “Cut-n-paste” into other
applications
Data Warehouse
• Centers around business
facts
• Multi-dimensional reports
– Age by Race by Program
• Aggregated data
• Fast
• 3rd
party reporting tools
can be used.
5. Measures Facts not ActivitiesMeasures Facts not Activities
Facts are business performance measurements
– Meals provided
– Dollars expended
– Hours worked
Facts are numerical and additive
– Sum of dollars spent
– Count of clients served
Facts are stored to represent a measurement at a
particular “grain”
6. What is a Grain?What is a Grain?
A grain is the level of detail at which a business
measurement is stored
Different businesses have different fact needs
– A Social Services grain
• The number of food stamp dollars given to a case each month
– In-Home Support Services grain
• The number of hours of service a client received in a
provider’s pay period
• The number of dollars paid to a provider for a client during a
pay period
7. What is a DimensionWhat is a Dimension?
A dimension is a textual description that
relates to a fact, for example:
– Ethnicity (White, Black, Japanese)
– Language (English, Spanish, Tagalog)
– Gender (Male, Female)
– Date (05/31/2003, 04/15/2003)
– Location (California, Arizona, New Mexico)
8. Used in QueriesUsed in Queries
Dimensions are used to restrict and frame queries on
facts, for example:
“Give me a count of all Spanish speaking white males in
California”
• The fact is the count (a number)
• The dimensions are:
– Spanish (language),
– white (race),
– male (gender),
– and California (location)
9. Identifying Facts and DimensionsIdentifying Facts and Dimensions
By Aid Type
By Program
By Month For (By) Year
Count of
Cases
Sum of Aid
Payments
Average per
Case
10. What makes a Data WarehouseWhat makes a Data Warehouse?
11. Cubes Answer Business QuestionsCubes Answer Business Questions
How many Spanish speaking clients did H&HS
serve in each department for each of the past 3
years?
Which cities currently have the highest concentration
of Asian clients? What has the trend been?
How many people who receive Medi-Cal received a
service in 2003 from health services, by service?
17. Where do we startWhere do we start?
• Choose the systems to include
• Identify the exact grain of the business
process
• Identify the dimensions available for use
with each fact table row
• Choose the numeric facts of what is being
measured
18. Key to SuccessKey to Success
To ensure success end user involvement is
required:
Data warehouse success is tied directly to
user acceptance. If the users haven’t
accepted the data warehouse …then your
efforts have been exercises in futility. (Kimball,
2002)
Editor's Notes
What are some business facts that you need, or would like, to be able to report on?
Here is an example of how to identify Facts and Dimensions on an existing report
The Facts are “math” words - Count of Cases, Sum of Aid Payments, Average of Pay per Case
The Dimensions are “grouping” words - (By) Program, (By) Aid Type, (By) Calendar Month, (For) Fiscal Year
We start with data from operational sources
We move this data into a staging area where business rules are applied
Code values are translated to a common set; i.e., M vs. male
Formats are changed to fit a standard; i.e., 5.1 vs. 5.1000
These rules make the data from different sources comparable (apples to apples)
Once the data is made “standard” it is loaded into the warehouse’s fact and dimension tables
We create the reporting cubes
Users access the cubes to analyze and report on the data
The data warehouse is to help you answer business questions.
To help you answer these questions there are Reporting Cubes.
Now I’m going to show some examples of how this comes together to help you in reporting and analyzing data.
While going through these, think of what YOU would like see.
In this report, the facts are Client counts, the dimensions are By Department, By Gender, and By Active Year.
Here we see another report.
Again the fact is a count of Clients, the dimensions are By Race group, By Department, For Active Year
These cubes can provide for drilling down into greater level of detail.
From the previous report we have “drilled” into the Social Services Division, “down” to the program level.
Can you tell what the dimensions are here?
By Race
By Department
By Program
By Active Year
We say that these cubes are multi-dimensional.
This report shows that we can combine dimensions to find even more interest information.
Notice that the Fact is Unique Client,
the Dimensions are By Race Group, By Gender, By Marital Status, By Department, and the Filter, or Selection, is For Active Year
Depending on the reporting tool, these reports can easily be converted in to visual graphs.
Here we see the prior grid report in a graph format.
This allows the user to quickly notice interesting information.
Now that the value of the data warehouse can be seen, how do we begin?