3 where are my keys - sql explorePresentation Transcript
Where Are My Keys? Ami Levin | CTO | DBSophic LTD
Session Goals A revisit to some of the fundamental design principals of relational databases: Normalization rules, key selection, and the controversies associated with these issues from a very practical, hands-on perspective, with special emphasis on some surprising performance issues that may arise from sub-optimal selection of keys...
Normalization “A relation whose domains are all simple can be represented in storage by a two-dimensional column homogeneous array of the kind discussed above. Some more complicated data structure is necessary for a relation with one or more non-simple domains. For this reason (and others to be cited below) the possibility of eliminating non-simple domains appears worth investigating. There is, in fact, a very simple elimination procedure, which we shall call… normalization”
1st Normal Form There's no top-to-bottom or left-to-right ordering to the rows and columns There are no duplicate rows Every row-column intersection contains exactly one value All columns are regular
1st Normal Form
2nd Normal Form R is in 1st Normal Form (1NF). Given any candidate key K and any attribute A that is not a constituent of a candidate key, A depends upon the whole of K rather than just a part of it.
2nd Normal Form
3rd Normal Form R is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent on every candidate key of R. A transitive dependency is a functional dependency, in which X -> Z (X determines Z) indirectly, by virtue of X -> Y and Y -> Z.
The Debate To ID or not to ID? IDENTITY (1,1) PK vs. Natural key
Pro Artificial In some cases, no natural key exists and an artificial key is the only option. Examples?
Pro Artificial Natural keys can change. Artificial keys never change. How Often? Cascading referential constraints Artificial keys can change
Pro Artificial Natural keys may become very long and complex. Get longer with each level 900 Bytes limit in SQL Server Multi-column Joins
Pro Artificial Artificial keys improve performance. Simpler join predicates Ever increasing clustering effect Short keys = Smaller DB = Faster (?)
Pro Artificial Artificial keys reduce clustered index fragmentation. All types of artificial keys? Minimize maintenance down time What about deletes? What about non-clustered indexes?
Pro Natural Natural keys have business meaning. Artificial keys are logically never queried for
Pro Natural Queries on tables using natural keys require fewer joins. The more familiar and meaningful the key, the less joins are required “Bypass” joins across levels
Pro Natural Data consistency is maintained explicitly when using natural keys. Artificial keys enable logical duplicates
Pro Natural Natural keys eliminate potential physical clustering performance issues. Contention for clustered regions
Less Mentioned Issues Artificial keys are the de-facto standard. ORM generate artificial keys LINQ doesn’t cache composite key rows
Less Mentioned Issues Data statistics and optimizations. Statistics on artificial keys are useless for parameter sniffing Selectivity estimations on composite key statistics are less accurate
Less Mentioned Issues Modularity and portability. Migration to other platforms Merging with other databases
Less Mentioned Issues Simplicity and aesthetics.
Demo Design Spec URL Country and city of owner Country ISO code needed by application Data consistency is crucial
Natural vs. Artificial DEMO
Ask Yourself… Is there a natural key that I can use as PK? Are there a few natural candidates? Which one is the simplest and most familiar? How stable is it? How will it be used logically? What will be the table physical usage patterns? What are the common query types for this table?
For More Information A Relational Model of Data for Large Shared Data Banksby E. F. CODD. The Relational Model for Database Management: Version 2 by E.F. Codd. An introduction to database systemsby C.J Date. Database in Depth: Relational Theory for Practitionersby C.J Date. The Database Relational Model: A Retrospective Review and Analysisby C.J Date. Joe Celko's Data and Databases: Concepts in Practice by Joe Celko. Joe Celko's SQL for Smarties, Fourth Edition: Advanced SQL Programmingby Joe Celko. Database Modeling and Design, Fifth Edition: Logical Designby Toby J. Teorey, Sam S. Lightstone, Tom Nadeau, and H.V. Jagadish. Pro SQL Server 2008 Relational Database Design and Implementationby Louis Davidson, Kevin Kline, Scott Klein, and Kurt Windisch.