SQL Server guru Ami Levin explains some of the fundamental design principles of relational databases: normalization rules, key selection, and the controversies associated with these issues from a practical perspective.
This presentation hits on the benefits and challenges of using different types of keys - natural, surrogates, artificial, and others.
Each key offers benefits from multiple aspects: data consistency, application development, maintenance, portability and performance.
Ami Levin is a Microsoft MVP and a consultant with SolidQ. Last fall he moved to California from Israel, where he led the Israeli SQL Server User Group.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Microsoft SQL Server Relational Databases and Primary Keys
1. Where Are My (Primary) Keys?
Ami Levin
Mentor,
ALevin@SolidQ.com
Think Big. Move Fast.
Presented to the
San Francisco SQL Server User Group
September 2013
2. GEOData Database
The GEOData database that is used for the
demo can be downloaded here:
http://sdrv.ms/Z8ZmNb
Ami Levin – SolidQ2 |
3. Session Goals
Revisit one of the fundamental design
principals of relational databases - key
selection.
Explore the controversies associated with it
from a very practical, hands-on perspective,
with a special emphasis on some surprising
performance issues that may arise from sub-
optimal selection of keys...
Ami Levin – SolidQ3 |
6. Normalization
“A relation whose domains are all simple can be
represented in storage by a two-dimensional column
homogeneous array of the kind discussed above.
Some more complicated data structure is necessary
for a relation with one or more non-simple domains.
For this reason (and others to be cited below) the
possibility of eliminating non-simple domains appears
worth investigating. There is, in fact, a very simple
elimination procedure, which we shall call
normalization”
Ami Levin – SolidQ6 |
7. 1st Normal Form
There's no top-to-bottom or left-to-right
ordering to the rows and columns
There are no duplicate rows
Every row-column intersection contains
exactly one value
There are no repeating groups
All columns are regular
Ami Levin – SolidQ7 |
8. 1st Normal Form
Ami Levin – SolidQ8 |
SSN Name PhoneNumber
123-45-6789 MuammarGaddafi +218-00-9876
987-65-4321 Bashar Assad +1-202-6543221
SSN First Name Last Name PhoneNumber
123-45-6789 Muammar Gaddafi
+218-00-9876,
+218-00-8765
987-65-4321 Bashar Assad +1-202-6543221
SSN First Name Last Name PhoneNumber1 PhoneNumber2
123-45-6789 Muammar Gaddafi +218-00-9876 +218-00-8765
987-65-4321 Bashar Assad +1-202-6543221
9. 2nd Normal Form
R is in 1NF.
Given any candidate key K and any
attribute A that is not a constituent of a
candidate key, A depends upon the whole
of K rather than just a part of it.
Ami Levin – SolidQ9 |
10. 2nd Normal Form
Ami Levin – SolidQ10 |
Order ID Line Number Customer
0001 1 FISSA
0001 2 FISSA
0002 1 PARIS
OrderDetailID OrderID LineNumber Customer
AC934245FF00B 0001 1 FISSA
8BA50CC2044AF 0001 2 FISSA
F00B344923AB4 0002 1 PARIS
11. 3rd Normal Form
R is in second normal form (2NF)
Every non-prime attribute of R is non-transitively
dependent on every candidate key of R.
A non-prime attribute of R is an attribute that does not
belong to any candidate key of R.
A transitive dependency is a functional dependency in
which X → Z (X determines Z) indirectly, by virtue of X →
Y and Y → Z.
Ami Levin – SolidQ11 |
12. 3rd Normal Form
Ami Levin – SolidQ12 |
Order ID Line Number Product Manufacturer
0001 1 Chair IKEA
0001 2 Gum Mentos
0002 1 Fighter Jet Boeing
14. The Debate
To ID or not to ID?
IDENTITY (1,1) vs. Natural key
Ami Levin – SolidQ14 |
15. Pro Artificial (I)
In some cases, no natural key exists
and an artificial key is the only option.
Examples?
Ami Levin – SolidQ15 |
16. Pro Artificial (II)
Natural keys can change.
Artificial keys never change.
How Often?
Cascading referential constraints
Artificial keys can change
Ami Levin – SolidQ16 |
17. Pro Artificial (III)
Natural keys may be long and complex.
Become longer with each level
900 Bytes limit in SQL Server
Multi-column joins
Ami Levin – SolidQ17 |
18. Pro Artificial (IX)
Artificial keys help improve
performance.
Simpler join predicates
Ever increasing clustering effect
Short keys = Smaller DB = Faster
Ami Levin – SolidQ18 |
19. Pro Artificial (X)
Artificial keys reduce clustered index
fragmentation.
Minimize maintenance down time
What about deletes?
What about non-clustered indexes?
Ami Levin – SolidQ19 |
20. Pro Natural (I)
Natural keys have business meaning.
Artificial keys are never queried for
Ami Levin – SolidQ20 |
21. Pro Natural (II)
Queries on tables using natural keys
require fewer joins.
The more familiar and meaningful the key,
the less joins are required
“Bypass” joins
Ami Levin – SolidQ21 |
22. Pro Natural (III)
Data consistency is maintained
explicitly when using natural keys.
Artificial keys enable logical duplicates
Ami Levin – SolidQ22 |
23. Pro Natural (IX)
Natural keys eliminate potential
physical clustering performance issues.
Contention for clustered regions
Ami Levin – SolidQ23 |
24. Less Mentioned Issues (I)
Artificial keys are the de-facto standard.
ORMs generate artificial keys
LINQ doesn’t cache composite key rows
…
Ami Levin – SolidQ24 |
25. Less Mentioned Issues (II)
Data statistics and optimizations.
Statistics on artificial keys are useless for
parameter sniffing
Estimations on composite key statistics
are less accurate
Ami Levin – SolidQ25 |
26. Less Mentioned Issues (III)
Modularity and portability.
Migration to other platforms
Merging with other databases
Ami Levin – SolidQ26 |
28. Demo Spec
A database of web sites.
URL
Country and city of owner
Country ISO code for external app
Data consistency is crucial
Ami Levin – SolidQ28 |
30. Ask Yourself
Is there a natural key that I can use as a primary
key?
Are there a few natural candidates?
Which one is the simplest and most familiar?
How stable is it?
How will it be used logically?
What will be the physical access patterns for this
table?
What are the common query types for this table?
Ami Levin – SolidQ30 |
31. For More Information
A Relational Model of Data for Large Shared Data Banks (E.F.
CODD)
The Relational Model for Database Management: Version 2 (E.F.
Codd)
An introduction to database systems (C.J. Date)
Database in Depth: Relational Theory for Practitioners (C.J.
Date)
The Database Relational Model: A Retrospective Review and
Analysis (C.J. Date)
Joe Celko's Data and Databases: Concepts in Practice (J. Celko)
Joe Celko's SQL for Smarties, Fourth Edition: Advanced SQL
Programming (J. Celko)
Database Modeling and Design, Fifth Edition: Logical Design
(T.J. Teorey, S.S. Lightstone, T. Nadeau, and H.V. Jagadish)
Pro SQL Server 2008 Relational Database Design and
Implementation (L. Davidson, K. Kline, S. Klein, and K. Windisch)
Ami Levin – SolidQ31 |