#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Star schema
1.
2. A data warehouse (or mart) is way of storing data for later retrieval. This retrieval isalmost always used to support decision-making in the organization. That is why manydata warehouses are considered to be DSS (Decision-Support Systems).
3. Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data
4. Both a data warehouse and a data mart are storage mechanismsfor read-only, historical, aggregated data.
5.
6. This is a standard, normalized database structure.
7.
8.
9. Therefore, with each transaction, these indexes must be updated along withthe table. This overhead can significantly decrease our performance.
10. There are some disadvantages to an OLTP structure, especially when we go to retrieve thedata for analysis.
11. For one, we now must utilize joins and query multiple tables to get allthe data we want. Joins tend to be slower than reading from a single table, so we want tominimize the number of tables in any single query.
12. One of the advantages of OLTP is also a disadvantage: fewer indexes per table.
13. In general terms,the fewer indexes we have, the faster inserts, updates, and deletes will be.
15. Since one of our design goals to speed transactions is to minimize the numberof indexes, we are limiting ourselves when it comes to doing data retrieval.
16.
17. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multipledimensions.
20. Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
21. List the columns that describe each dimension.(region name, branch name, region name).
22.
23. In a star schema, a dimension table will not have any parent table.
24. Whereas in a snow flake schema, a dimension table will have one or more parent tables.
25. Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
26.
27. When I talk about “by” conditions, I am referring to looking at data by certain conditions
28. For example, if we take the question “On a quarterly and then monthly basis, are DairyProduct sales cyclical” we can break this down into this: “We want to see total sales bycategory (just Dairy Products in this case),by quarter or by month.”
29. Here we are looking at an aggregated value, the sum of sales, by specific criteria.
30. When we talk about the way we want to look at data, we usually want to see some sort ofaggregated data. These data are called measures.
32. We need to look at our measures using those “by” conditions. These “by” conditions are called dimensions.
33. When we say we want to know our sales dollars, we almost always mean by day, or by quarter, or by year.
34. These by conditions will map into dimensions:there is almost always a time dimension, and product and geographic dimensions are verycommon as well.
40. For example, if we have a Product dimension (which is common) we have fields in it that contain the description, the category name, the sub-category name, etc.
41. These fields do not contain codes that link us to other tables. Because the fields are the full descriptions, the dimension tables are often fat; they contain many large fields.
42. Dimension tables are often short, however. We may have many products, but even so, the dimension table cannot compare in size to a normal fact table.
43. Dimension tables are often short, however. We may have many products, but even so, the dimension table cannot compare in size to a normal fact table.
45. Notice that both Category and Subcategory are stored in the table and not linked in through joined tables that store the hierarchy information.
46.
47. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables.
48. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
49. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
52. Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
53. List the columns that describe each dimension.(region name, branch name, region name).
54.
55. The measures are numeric and additive across some or all of the dimensions.
56. For example, sales are numeric and we can look at total sales for a product, or category, and we can look at total sales by any time period.
57. While the dimension tables are short and fat, the fact tables are generally long and skinny.
58. They are long because they can hold the number of records represented by the product of the counts in all the dimension tables.
59. In this schema, we have product, time and store dimensions. If we assume we have ten years of daily data, 200 stores, and we sell 500 products, we have a potential of 365,000,000 records (3650 days * 200 stores * 500 products). As you can see, this makes the fact table long.
60. The fact table is skinny because of the fields it holds. The primary key is made up of foreign keys that have migrated from the dimension tables.
61. These fields are just some sort of numeric value. In addition, our measures are also numeric. Therefore, the size of each record is generally much smaller than those in our dimension tables.
62.
63. Non Additive - Measures that cannot be added across all dimensions.
64. Semi Additive - Measures that can be added across few dimensions and not with others.