Why Be Normal:
Understanding the benefits of a solid data
model
Rob Armstrong
Teradata,
Director of Data Warehouse Support
What’s new?
Lots of migrations from other platforms
– Forklift old models
– Data mart consolidations
Database versus Company messages
– Database doesn’t care
– Experiences illustrate the business value
Speed for Business Agility and Active Data
Warehousing
Big Points to Keep in Mind
Logical Models are about relationships
– Independent of function
– Independent of technical limits
Physical models are about functions
– Performance
– Data Management
Your Physical model should preserve
relationships while improving function
Logical Modeling
Normalized
– Third Normal Form is enough while being useful
– No surrogate or identity keys
– No history or summary tables
– Preserves relationships between entities
Dimensional
– Looks at usage of data
– Embeds “dimensions” into “fact” tables
– Logical Model typically retro-fitted from Physical
Design
What is the difference?
Relationships are constant
– Who provides what?
– Who pays for what?
– Where is service provided?
– When is transaction effective?
Functions constantly change
– What customers paid with Cash?
– What customers have not contacted the call
center in the past 12 months?
The benefits of Normal Models
Referential Integrity is inherent to the model
and therefore can be instantiated at the core
level
Transactional system like normalized
models due to less data replication, making
ETL and ELT easier
Cost are lowered by less replication,
minimized data management, and quicker
application development
The benefits of Normal Models
Relationships are preserved, therefore new
analytics are readily supported
Supports natural growth as new subject
areas are prioritized for inclusion into the
enterprise model
Normalized models support native
unbalanced or ragged hierarchies
The benefits of Normal Models
Normalized models enable data mining and
statistical analytics
Supports complex analytics which are
based on relational algebra
Creates environment of “what if” instead of
“how come”
The REAL benefit of a Normal Model
Supports change over time
– Integration of new subject areas
– Effective dating eliminates slowing changing
dimensions
– Provides multiple views of same data with
consistency
– New applications and user communities are
absorbed with little effort
To be fair, the benefits of Denormalized
Tuned for the known access paths to give
higher performance
Model reflects the output to minimize data
manipulation
Easier for users to navigate and understand
Can be built quickly
The optimization escalation
Normalized Model
Views, Indexes, and Priority
Cross functional denormalization and
aggregation
Specific denormalization and aggregation
Extract, Expand, Examine
Recent enhancements to help
Recursive statement in SQL
PPI (and Multi-level PPI)
– Possibly remove cube builds to a great degree
Bulk Merge
– Removes obstacles for advanced indexes and
multi-load
In database OLAP processing
– Advanced AJI’s, wizards, SAS procs
TASM
– Workload based and Service level goal reporting
Pitfalls to avoid
LDM to PDM
– Over compromising for known queries
– Addition of indexes and summary tables
– Use of history tables
Primary Index selection
– Model is correct but PI is wrong
– Distribution first, access path second
Data Integrity
– Missing referential integrity leads to outer-joins
– Data type inconsistency leads to over-
processing
Other modeling points to watch
Surrogate Columns
– Used to “simplify” joins
– Have to be ingrained everywhere
– Rarely known for access purposes
Identity Columns
– Definitions
– Same problems as surrogates
Intelligent Keys
– Embedded information within larger datatypes
– Ex. VIN number
– Creates maintenance obstacles if parts need to
change
Going Forward… Remember
Data Warehousing is to drive change and
therefore must support constant change
Data relationships and transactions are
constant, it is access and output that
change
For processes to change quickly, the data
manipulation must be removed from the
path
Have the model reflect the atomic data
relationship and historical relevance
Now what?
New migrations
– Get model correct if at all possible
– If consolidating, realize integrating is the next,
and more important, step
– At least get major data elements consistent
Existing systems
– Look at subject areas with high overlap
– Look for the analytics that are proving tricky
– Work to show the value of normalization with
more cross functional analytics
0 comments
Post a comment