And How to Avoid Them
Ten
Karen Lopez
Sr. Project Manager
Abstract
What's going on in your physical data model? How many people
can or will update it to match the reality of what's going on in
your databases? Who decides what goes into the physical model?
In this presentation we discuss 10 physical data modeling
mistakes that cost you dearly. Will your physical design lead to
performance snags, development delays, bugs and weakening of
professional respect?
Data Architects are often tasked to prepare first cut physical data
models, yet these skills usually overlap those of DBAs and
Developers and this overlap can lead to contention, confusion,
and complacency. With this presentation, you'll learn about the 10
blunders, how to find them, plus 10 tips on how to avoid them.
Karen López is a principal
consultant at InfoAdvisors. She
specializes in the practical
application of data management
principles. Karen is also the
ListMistress and moderator of the
InfoAdvisors Discussion Groups at
www.infoadvisors.com and
dm-discuss
Speaker Bio
Your opinion counts
Contributions are required
Ask Questions at any time
@datachick, with hashtag #10Blunders
About this Presentation
The Problem
Blunders, FAILs, D’Oh’s, Faults and WTHs?
10Tips
Agenda
The Problem &The Solution
Assuming Numbers are Numbers
Confirmation
Number
Leading Zeros
Non-Numeric
Special Characters
Sorting Issues
Externally Managed Numbers
D’OH!
Add More Examples.
Numbers that Aren’t
Numbers
Dates/Times that Aren’t
Dates/Times
Texts that Aren’t (Really)
Text
Social Security Numbers Birthdays Format embedded values
Vehicle Identification Numbers Intervals ???
Telephone Numbers
Account Numbers
DEMO
Task
Some Additional NTANs
Numbers that Aren’t Numbers Dates/Times that Aren’t
Dates/Times
Texts that Aren’t (Really)Text
Social Security Numbers Birthdays Format embedded values
Vehicle Identification Numbers Intervals ???
Telephone Numbers MS SQL Server Timestamps
Account Numbers Lengths of time
ZIPCodes Days
UPC/GTIN/EAN/Barcodes Chris Date
1. Use a numeric-ish datatype when
there’s math involved.
2. Do your research
3. If it’s externally managed data, its
format or composition could
change at any time. Anticipate
that.
4. Anticipate leading zeros.
5. Anticipate significant formatting,
characters & other tricks.
Avoid it
Choosing an Silly Long Primary Key
32 + 8 + 32 + 32 + (0 to 22) = A really big key
that only a Data Architect could love
• Establish Primary Key standards and styles
• Smaller PKs lead to smaller indexes and
smaller databases. Take advantage of that
• Choose PKs that do not change when data
changes
Avoid it
DEMO
Turning off RI for Development
DEMO
Educate team on the risks & nearly universal outcome of
turning off RI
Generate RI in every script
Develop test data
Develop scripts for loading data
Ensure team has easy access to Data Models
Avoid it
Using the Defaults
Defaults <> Best Practices
Product Defaults aren’tYOUR defaults
Defaults might even be…faulty.
Defaults
DEMO
Create a DATest database…and use it.
Test Generation Options to Test DB with
Data
Generate Test Scripts, Varying Options
Review options with DBAs and
Developers.
Choose Defaults, don’t default them
Save your Default Set
Avoid it
Using GUIDsWhereThey Don’t AddValue
A globally unique identifier or GUID …is a unique reference
number used as an identifier in computer software.The term
GUID also is used for Microsoft's implementation of the
Universally Unique Identifier (UUID) standard.
The value of a GUID is represented as a 32-character
hexadecimal string, such as {21EC2020-3AEA-1069-A2DD-
08002B30309D}, and is usually stored as a 128-bit integer.The
total number of unique keys is 2128 or 3.4×1038 — roughly 2
trillion per cubic millimeter of the entire volume of the Earth.
This number is so large that the probability of the same number
being generated twice is extremely small.
Wikipedia contributors, "Globally unique identifier," Wikipedia,The Free Encyclopedia,
http://en.wikipedia.org/w/index.php?title=Globally_unique_identifier&oldid=415700895 (accessed March 7, 2011).
GUID
Applying Surrogate Keys Incorrectly
Applying a Surrogate Key
New
DEMO
• Establish Primary Key standards and
styles
• Smaller PKs lead to smaller indexes and
smaller databases. Take advantage of
that.
• Don’t forget the Business Key
• Choose PKs that do not change when
data changes
• Understand how Identity columns work
Avoid it
1.Understand Datatype Sizes
2.Understand Database Internal
File Structures
3.Determine DataVolume
4.Determine Data Growth
Volumes
5.Balance What-if and What-Will-
Be
Avoid It
Using
Identity
Property or
Identifier
Incorrectly
DEMO
Failing to
Compare
DEMO
Silly
Naming
Standards
SHOW AND
TELL
Skipping
Training
DEMO
OTHER
BLUNDERS?
Completely separate
logical model
Failing to take into
account whole
environment.
Not testing your
design
Ignoring reference
data design
Other Blunders
Hand generating
scripts
Failing to report issues
Denormalizing too
soon
Just-in-case-itis
Making performance
more important than
integrity
Using the wrong
tool
Too much
Subtyping
Too Many Flexible
hierarchies
Wrong Datatypes
Duplicate Indexes
Other Blunders
Ignoring
Everything but
Tables and
Columns
Using overly
generic design
patterns
Column order
1. Understand cost, benefit and risk
2. Get formal training on target
technologies
3. Work near your DBAs and Developers
4. Ask DBAs about trade-offs, not just
solutions
5. Build portfolio of performance vs.
integrity trade-offs
10Tips for Avoiding Physical Modeling
Blunders
6.Profile Source Data
7. Build Test Databases, with Data
8.Test Scripts
9.Compare, Compare,
Compare…then Compare some
More.
10.Get MoreTraining.
10Tips for Avoiding Physical Modeling
Blunders
Summary
1.Don’t assume that
numbers are….numbers
2.Choose the right primary
key
3.Apply surrogate keys
correctly
4.Keep RI turned on
5.Architect, don’t default
6.Keys are blunder-prone
7.Training is required. Don’t
pass up any opportunity for
training.
ThankYou!
Karen@InfoAdvisors.com
Feedback, questions,
comments are always
appreciated.
http://www.speakerrate.com/karenlopez
http://blog.infoadvisors.com

Karen Lopez 10 Physical Data Modeling Blunders

  • 1.
    And How toAvoid Them Ten Karen Lopez Sr. Project Manager
  • 2.
    Abstract What's going onin your physical data model? How many people can or will update it to match the reality of what's going on in your databases? Who decides what goes into the physical model? In this presentation we discuss 10 physical data modeling mistakes that cost you dearly. Will your physical design lead to performance snags, development delays, bugs and weakening of professional respect? Data Architects are often tasked to prepare first cut physical data models, yet these skills usually overlap those of DBAs and Developers and this overlap can lead to contention, confusion, and complacency. With this presentation, you'll learn about the 10 blunders, how to find them, plus 10 tips on how to avoid them.
  • 3.
    Karen López isa principal consultant at InfoAdvisors. She specializes in the practical application of data management principles. Karen is also the ListMistress and moderator of the InfoAdvisors Discussion Groups at www.infoadvisors.com and dm-discuss Speaker Bio
  • 4.
    Your opinion counts Contributionsare required Ask Questions at any time @datachick, with hashtag #10Blunders About this Presentation
  • 5.
    The Problem Blunders, FAILs,D’Oh’s, Faults and WTHs? 10Tips Agenda
  • 8.
  • 9.
    Assuming Numbers areNumbers Confirmation Number
  • 10.
    Leading Zeros Non-Numeric Special Characters SortingIssues Externally Managed Numbers D’OH!
  • 11.
    Add More Examples. Numbersthat Aren’t Numbers Dates/Times that Aren’t Dates/Times Texts that Aren’t (Really) Text Social Security Numbers Birthdays Format embedded values Vehicle Identification Numbers Intervals ??? Telephone Numbers Account Numbers
  • 12.
  • 13.
    Some Additional NTANs Numbersthat Aren’t Numbers Dates/Times that Aren’t Dates/Times Texts that Aren’t (Really)Text Social Security Numbers Birthdays Format embedded values Vehicle Identification Numbers Intervals ??? Telephone Numbers MS SQL Server Timestamps Account Numbers Lengths of time ZIPCodes Days UPC/GTIN/EAN/Barcodes Chris Date
  • 14.
    1. Use anumeric-ish datatype when there’s math involved. 2. Do your research 3. If it’s externally managed data, its format or composition could change at any time. Anticipate that. 4. Anticipate leading zeros. 5. Anticipate significant formatting, characters & other tricks. Avoid it
  • 15.
    Choosing an SillyLong Primary Key
  • 16.
    32 + 8+ 32 + 32 + (0 to 22) = A really big key that only a Data Architect could love
  • 17.
    • Establish PrimaryKey standards and styles • Smaller PKs lead to smaller indexes and smaller databases. Take advantage of that • Choose PKs that do not change when data changes Avoid it
  • 18.
  • 19.
    Turning off RIfor Development
  • 20.
  • 21.
    Educate team onthe risks & nearly universal outcome of turning off RI Generate RI in every script Develop test data Develop scripts for loading data Ensure team has easy access to Data Models Avoid it
  • 22.
  • 23.
    Defaults <> BestPractices Product Defaults aren’tYOUR defaults Defaults might even be…faulty. Defaults
  • 24.
  • 25.
    Create a DATestdatabase…and use it. Test Generation Options to Test DB with Data Generate Test Scripts, Varying Options Review options with DBAs and Developers. Choose Defaults, don’t default them Save your Default Set Avoid it
  • 26.
  • 27.
    A globally uniqueidentifier or GUID …is a unique reference number used as an identifier in computer software.The term GUID also is used for Microsoft's implementation of the Universally Unique Identifier (UUID) standard. The value of a GUID is represented as a 32-character hexadecimal string, such as {21EC2020-3AEA-1069-A2DD- 08002B30309D}, and is usually stored as a 128-bit integer.The total number of unique keys is 2128 or 3.4×1038 — roughly 2 trillion per cubic millimeter of the entire volume of the Earth. This number is so large that the probability of the same number being generated twice is extremely small. Wikipedia contributors, "Globally unique identifier," Wikipedia,The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Globally_unique_identifier&oldid=415700895 (accessed March 7, 2011). GUID
  • 28.
  • 29.
  • 30.
  • 31.
    • Establish PrimaryKey standards and styles • Smaller PKs lead to smaller indexes and smaller databases. Take advantage of that. • Don’t forget the Business Key • Choose PKs that do not change when data changes • Understand how Identity columns work Avoid it
  • 32.
    1.Understand Datatype Sizes 2.UnderstandDatabase Internal File Structures 3.Determine DataVolume 4.Determine Data Growth Volumes 5.Balance What-if and What-Will- Be Avoid It
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
    Completely separate logical model Failingto take into account whole environment. Not testing your design Ignoring reference data design Other Blunders Hand generating scripts Failing to report issues Denormalizing too soon Just-in-case-itis Making performance more important than integrity
  • 43.
    Using the wrong tool Toomuch Subtyping Too Many Flexible hierarchies Wrong Datatypes Duplicate Indexes Other Blunders Ignoring Everything but Tables and Columns Using overly generic design patterns Column order
  • 44.
    1. Understand cost,benefit and risk 2. Get formal training on target technologies 3. Work near your DBAs and Developers 4. Ask DBAs about trade-offs, not just solutions 5. Build portfolio of performance vs. integrity trade-offs 10Tips for Avoiding Physical Modeling Blunders
  • 45.
    6.Profile Source Data 7.Build Test Databases, with Data 8.Test Scripts 9.Compare, Compare, Compare…then Compare some More. 10.Get MoreTraining. 10Tips for Avoiding Physical Modeling Blunders
  • 46.
    Summary 1.Don’t assume that numbersare….numbers 2.Choose the right primary key 3.Apply surrogate keys correctly 4.Keep RI turned on 5.Architect, don’t default 6.Keys are blunder-prone 7.Training is required. Don’t pass up any opportunity for training.
  • 47.
    ThankYou! Karen@InfoAdvisors.com Feedback, questions, comments arealways appreciated. http://www.speakerrate.com/karenlopez http://blog.infoadvisors.com

Editor's Notes

  • #5 June 2008
  • #10 1
  • #13 Show ER/studio data model Adv Works Address Bad Address Show Tables Show Capacity Planning Lengths Show DDL Show Management Studio Data
  • #16 2
  • #17 2
  • #20 Enforcing RI: Foreign Keys vs Triggers Foreign Keys Triggers Checked before the data modification is made | Executed after the data modification has been made Do not extend transactions | Extend the life of a transaction as it executes code to check RI (can be extended even longer if an integrity violation is encountered which requires ROLLBACK) Can be added with an ALTER TABLE statement; No special coding required | Coding is required Prevent TRUNCATE of a parent table | Do not prevent TRUNCATE of a parent table; DELETE triggers are not fired when a TRUNCATE is issued Table relationships can be easily seen in data model diagrams Table relationships are not readily seen in data model diagrams
  • #21 Ca erwin eMovies Action
  • #23 4
  • #27 5
  • #29 6
  • #34 7
  • #36 8
  • #38 9
  • #40 10