3. Who Am I?
• Terry Bunio
• Data Base Administrator
– Oracle
– SQL Server 6,6.5,7,2000,2005,2008,2012
– Informix
– ADABAS
• Data Modeler/Architect
– Investors Group, LPL Financial, Manitoba
Blue Cross, Assante Financial, CI Funds,
Mackenzie Financial
– Normalized and Dimensional
• Agilist
– Innovation Gamer, Team Member, SQL
Developer, Test writer, Sticky Sticker, Project
Manager, PMO on SAP Implementation
7. Once upon a time
• Worked on a project for a
client in Luxembourg
• Interesting point
– Luxembourg has four official
languages
• English
• French
• German
• Flemish (I think)
8. Once upon a time
• Need to create multi-lingual
descriptions for reference table
• Currently only required English
and French
• Convinced team that we would
soft model the language
9. Once upon a time
• These tables also had
independent surrogate kets for
all reference table values
10.
11. Once upon a time
• It wasn’t fun
• Queries performed terribly and
were overly complex
• Never used the extra flexibility
and we eventually replaced the
functionality with an English
and French description field
12.
13. Once upon a time
• Not my design
• Once saw a database that
actually stored all text fields on
one table
– You joined to the table with the
Primary Key from the description
table
• Some queries joined to the
name table over 10 times.
15. All Claims
• Anyone work with SAP?
• Their tables are not tables as
much as large flat files
• Record type and other
extremely codified fields
• Really hard to make sense of
16. All Claims
• To make it easier on
developers we created an
All_claims table that would join
all the relative data together
and also do some filtering
17.
18. All Claims
• This became quite the beast of
an object
• Became a focal point for
performance tuning
• No one could access the data
until it was loaded
19. All Claims
• We eventually had to develop
a net change process as we
couldn’t reload all the records
every day
• Ended up being very
successful
– Lot of heartache
– Extremely talented developer
21. Recursion
• Usually used to model multiple
levels of an object
– Office structure
– Organization Hierarchy
– Etc…
22. Recursion
• Looking back…
– Seemed to be an intellectual
exercise
– Can I figure out a way to
dynamically model this?
23. Recursion
• Question is:
– Does the data need a dynamic
model?
– Looking back
• The models were 99% stable
• Dynamic model was being down
for the future
• Definitely over engineering
24. Recursion
• So what?
– Complexity in retrieving data
• Especially for reports
– The data would need to have
multiple levels and the ability to
move between different multiple
levels frequently for me to model
the data recursively like this
again
25. Recursion
• What not just model the data in
a fixed way and deal with
changes as need
– Region
– Division
– Department
• Whoops! Just add Sub-
Division when required and
convert
26. Agenda
• Data Modeling Mistakes
– Anthropomorphism
– Over-Engineering
– Keys
• GUIDs
• Surrogate/Real Keys
• Composite Keys
– Deleted Records
– Nulls
– History
– Recursion
27. Definition
• “A database model is a
specification describing
how a database is
structured and used” –
Wikipedia
28. Definition
• “A data model describes how
the data entities are related
to each other in the real
world” – Terry (5 years ago)
• “A data model describes how
the data entities are related
to each other in the
application” – Terry (today)
30. Relational
• Relational Analysis
– Database design is usually in
Third Normal Form
– Database is optimized for
transaction processing. (OLTP)
– Normalized tables are optimized
for modification rather than
retrieval
31. Normal forms
• 1st - Under first normal form, all
occurrences of a record type must contain
the same number of fields.
• 2nd - Second normal form is violated
when a non-key field is a fact about a
subset of a key. It is only relevant when
the key is composite
• 3rd - Third normal form is violated when
a non-key field is a fact about another
non-key field
Source: William Kent - 1982
32. Normal Forms for the
Layman
• 1st – Table only represents
one type of data
– No row types
• 2nd – Field does not depend
on only a part of the Primary
Key
• 3rd – Field depends only on
the Primary Key
33. Remember
• Remember to ask ourselves
when we are modeling
• Do either of the options
contradict the normal forms
• Usually we model past 3rd
normal form based on other
biases
37. Amazon
• Warehouse is organized
totally randomly
• Although humans think the
items should be ordered in
some way, it does not help
storage or retrieval in any way
– In fact in hurts it by creating ‘hot
spots’ for in demand items
38. Data Model
Anthropomorphism
• We sometimes
create objects in
our Data Models
are they exist in the
real world, not in
the applications
39. Data Model
Anthropomorphism
• This is usually the case for
physical objects in the real
world
– Companies/Organizations
– People
– Addresses
– Phone Numbers
– Emails
40. Data Model
Anthropomorphism
• Why?
– Do we ever need to consolidate all
people, addresses, or emails?
• Rarely
– We usually report based on other
filter criteria
– So why do we try to place like real
world items on one table when
applications treat them differently?
42. Over Engineering
• Additional flexibility that is
not required does not
simplify the solution, it overly
complicates the solution
43. Over Engineering
• These are usually tables that
have multiple mutually
exclusive foreign keys
– Only one is filled at any one time
• Why not just create separate
join tables?
– Doesn’t violate any normal forms
45. GUIDs
• Oscar winner for worst choice
for a Primary Key ever
• Selected based on over
engineering because they
would never be duplicates
46. GUIDs
• In the meantime they caused
excessive index length, user
frustration, and complex query
execution plans
• Just say no.
47. GUIDs
• Especially don’t use them on
tables with a fewer number of
records
• Who says all the Primary Keys
In a database need to be of
the same type?
48. Surrogate Keys
• Surrogate Keys are a huge
benefit
• Straight Integer keys are
probably the most common
– Users are the most used to
integer keys as well
• Same as bank account, credit
cards, other account information
49. Surrogate Keys
• The exception
– Don’t, don’t, don’t use Surrogate
keys for Reference or Support
tables
– Causes needless lookups for
clients, SQL queries, and for
reports
50. Surrogate Keys
• Do we really need to assign a
numeric Primary Key for
Gender and Province codes?
– Especially since these value
very rarely change
– Might make sense for reference
tables that change more
frequently.
53. Composite Keys
• Composite Keys are needed to
violate 2nd normal form
– Remove Composite Keys, you
remove being able to have that
violation
• Just a bad idea as there is
inherent meaning that the
Primary Key can change
54. Deleted Records
• Are we soft deleting or hard
deleting records?
• Used to like soft deleting as
you never lost data
• But this can make queries a
nightmare with needing to filter
on deleted records for every
table in a query
55. Deleted Records
• Soft deleted records also
perform quite poorly when
included in an index due them
only having two values
– Or else you need to add the
deleted indicator to many
indexes
– Both are inefficient
57. Nulls
• Nulls are evil
• Do whatever you can to avoid
nulls
– Column Defaults
– Domain Defaults
– Did I mention defaults?
58. Nulls
• Nulls can complicate queries
just like deleted indicators
• Probably also are the number
one cause of devious, mind-
bending defects
– Think of the time you will save!
59. Nulls
• For this reason, Nulls are the
first thing that goes when
create a Self Service Data
Warehouse
61. History
• Where and how should we
store history?
• Transaction tables are easy
– They usually have always been
historical tables
• But what about tables like
person and address?
62. History
• Few options
– Create history record on same
table
– Create history record on history
table for each table
– Create history record on one
audit table
– Don’t store it and let the Data
Warehouse worry about it
63. History on same table
• Keeps the number of tables in
your database to a minimum
• Keeps queries cleaner
• Complicates queries as you
now need to include/exclude
– And you will need to add
additional data information
64. History on separate table
• Dirties up the database as you
create a history copy of every
table in the database
• Some Queries are cleaner
• Some Queries now need to join
twice as many table though!
65. History on Audit table
• Queries are cleaner
• Database is cleaner
• But depending on the solution,
you may end up having One
absolutely huge table to parse
through.
66. History in Data
Warehouse
• Perhaps the cleanest option
• Requires a commitment to
infrastructure
• Latency may also become an
issue