Database design guide


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Database design guide

  1. 1. Database Design Guide Source : you set up a new database, usually you spend a lot of time at thewhite board. Here are some basic tips: my Dos and Donts of databasedesign. Most probably they will reduce your efforts and help you to gain aclean database design. I didnt write the book on database design, but Ithink my experience earned in many projects could be helpful in somecases. My examples refer to Progress® databases and Progress Software®4GL, but youll get the idea, even when you use another database system.Lets start with a few naming conventions. The usage of dashes, spaces,digits and special characters is a bad idea, although your database andoperating system might handle these characters (Cobol semantics likeCUST-NAME-1 are ugly and outdated). Ensure the uppercase and lowercaseconversion of each name (applies to tables, prefixes, attributes, sequencesetc.) is unique within the scope of your enterprise-wide databases. Checkyour spelling, renaming tables and attributes afterwards is a PITA.CamelCase all names. Acronyms and abbreviations should not be used innames when they arent well known by your users. If you cant avoid them,write only the first character in capital letters, especially in composite nameslike UpsServiceTypes or Customer.VatId. Well, no rule withoutexception, ID (unique tuple identifier in a table) as well as OID (enterprise-wide unique object identifier) should always be printed in capital letters, aslong as the abbreviation is part of the name of a technical attribute(e.g. VatId - Value Added Tax Identifier vs. CustID or CustOID - primarykey of Customers).Avoid language mixups, especially if youre not a native speaker and/or yourapplication has no English user interface. Nameslike Buddies.BudVerjaardag sound plain silly, but Maat.MtVerjaardag isunderstandable (at least if your understanding of Dutch is flawed). Checkyour spelling. Once your application is running, its hard to live with typos.Table names and labels designate the business object. Dont use technicalwording nor geek speech. Persistent instances of customers live in a tablenamed Customers, assigned UPC numbers inAssignedUpcNumbers, UPSshipments in UpsShipments and UPS parcels in UpsParcels. Since you storemore than one instance of a business object in each table, use plural only.Each table can be identified by an (enterprise-wide) unique prefix. Neveruse a prefix twice. If you have bothInvoices and Inventories, assign
  2. 2. different prefixes like Inv for Invoices and Ivt for Inventories. The prefixis part of each attribute name and should be used in related sequences andindex names as well.So far, so easy. When it comes to attribute names, naming conventionsbecome more complicated. Lets start with technical attributes, becausethere is no occasion for interpretations.In order to guarantee uniqueness, each table has a technical primarykey (a surrogate primary key populated by the create trigger with a uniquesequence value, but preferential a UUID), which will never get abusiness meaning. Dont argue, primary keys with business meaning aswell as composite keys are a bad idea. There is nothing to say againstadditional unique columns with business meaning, but do not merge theunderlying technical implementation with your business logic. Name theprimary key = table prefix + OID (or ID), e.g. CustOID or CustID. If anobject has children or is an attribute of other objects, use the unchangedand unextended name of the parent tables primary key as foreign key inthe child table respectively referencing table.Say youve a table Invoices and a table Addresses:Addresses.AdrOID [primary key]Addresses.AdrOtherAttributes ...Invoices.InvOID [primary key]Invoices.AdrOID [foreign key]Invoices.InvOtherAttributes ...Index Invoices.AdrOID and you can codeFOR EACH Addresses OF Invoices: Do something.END.orFOR EACH Invoices WHERE Invoices.InvNetAmount >= 1000.00, EACH Addresses OF Invoices WHERE Adresses.AdrZipCode BEGINS 34: Do something.END.instead ofFOR EACH Addresses WHERE Addresses.AdrOID = Invoices.AdrOID: Do something.END.There is one exception to this rule. Sometimes an object is an attribute ofanother object multiple times, without being a class itself. Different roles aremarked by a number sign #. The most important foreign key name is keptas is, other roles are extended by #Role:Invoices.InvOID [primary key]Invoices.AdrOID [billing address]Invoices.AdrOID#Delivery [delivery address]Actually, this is way beyond a clean (normalized) database design. Also,most design tools will not handle such non-normalized structures. If
  3. 3. possible, you should avoid attribute name extensions, better normalizeinstead. To bring this point home, lets say your customers providepermanent delivery addresses. By the way, delivery addresses tend to havetheir own attributes and behavior. Most probably a bunch of shippingaddresses are an attribute of Customers:DeliveryAddresses.DelAdrOID [primary key]DeliveryAddresses.CustOID [foreign key]DeliveryAddresses.AdrOID [foreign key]DeliveryAddresses.DelAdrDispatchType [another attribute, which in real lifewould be the reference to a carrier]Invoices normalized:Invoices.InvOID [primary key]Invoices.AdrOID [billing address]Invoices.DelAdrOID [delivery address]Lets come to attributes with business meaning. Besides technicalattributes in different roles, I can think of other cases where it is necessaryto extent attribute names. For example default values. As long as there isjust one default value, put it in the attributes definition. Otherwise youve atable storing those values:Discounts.DiscOIDDiscounts.DiscAppliesToBusinessType[e.g. wholesale, distributors, retail...]Discounts.DiscPercentSince discounts given to customers are calculated individually, thepercentage can vary from customer to customer and it makes no sense toreference Discounts in Customers. However, in the interest of a readablemodel it is good style to mark the source, therefore the attribute discountpercent of Customerskeeps its source:Customers.CustOIDCustomers.DiscPercent#CustThere are other advantages of consistent naming rules. In commercialapplications youre dealing with discount percentages in tons of objects.Imagine you need to analyze your enterprise wide discount policy. Findingall instances of discount percentages can become a PITA in complexsystems. Consistent naming provided, you can search in your system tablesfor DiscPercent* and you get a complete list:Discounts.DiscPercentCustomers.DiscPercent#CustInvoices.DiscPercent#InvInvoiceLines.DiscPercent#InvLine...If your application shall be used by a group of (affiliated) companies, whereeach single company is representing another client in the multi-clientcapable accounting system, things become difficult. The easiest solutionwould be the physical splitting of your ERP database. Keep all common
  4. 4. objects like countries, currencies, users, clients (=accounting clients) etc. inone database, and all company related objects in another database. Connectyour users to the first ERP database and the accounting database, let themchoose a client, then create an alias for the clients ERP database to ensureall client databases can share the same programs. Large operations tend toshop and sell subsidiary companies every once in a while. The usage ofphysical client databases makes this kind of moves a simple and painlesstask.Unfortunately, sometimes a developers life is not that easy. In amulticorporate enterprise many subsidiary companies work on the sameprojects, billing their time and material partly within the group. That meanssubsidiary companies share access to a lot more business objects than justcountries and currencies. Besides a ton of group-wide objects, templates toensure enterprise-wide identical customer account numbers and such stuff,you need the attribute accounting client in many objects. Do not use thesame attribute name in all tables, because database systems and designtools cant handle the primary relations if you do it. Name thecolumn client number (or client OID) differently in each table, using thesource pointers explained before, e.g.Invoices.ClientNumber#InvCustomers.ClientNumber#CustVendors.ClientNumber#Vend...The above said leads to the cognition, that consistent naming is a goodidea in general. IOW: Without a strong naming convention your projectwill fail. Each and every name must be self-explanatory and similarmeanings must be kept in identical wording. Some examples:Invoices.InvPrinted says whether an invoice has been printed ornot, Invoices.InvDatePrinted stores the date of the lastprintout, Invoices.InvPrintCounter tells us how many times an invoicehas been printed yet and can be used to mark copies. The same goes forconfirmations of orders and other forms:OrderConfirmations.OrdConfPrinted, OrderConfirmations.OrdConfDatePrinted, OrderConfirmations.OrdConfPrintCounter and so on.Look at the first attribute in my example. In commonspeech Invoices.InvPrinted can stand for a Boolean value as well as for adate. To avoid any confusion, you can make it even clearer by naming thelogical attribute Invoices.InvIsPrinted, which leads to perfectlyunderstandable code like ...
  5. 5. FOR EACH Invoices WHERE NOT Invoices.InvIsPrinted AND Invoices.InvDateCreate =< (TODAY - 10) AND Invoices.InvIsDispatched, EACH Customers OF Invoices, EACH Staff OF Customers: ASSIGN lOk = sendEmail(Staff.StEmailAddy, Send out invoice # + STRING(Invoices.InvNumber) + $ + string(Invoices.InvGrossAmount), To + crlf + getMailAddress(Customers.AdrOID) + crlf + immediately) Staff.StBrowniePoints = Staff.StBrowniePoints - 1.END.... and more examples. All types of amounts are addressed by the samename:Invoices.InvNetAmount Orders.OrdNetAmount ...Invoices.InvTaxAmount Orders.OrdTaxAmount ...Invoices.InvGrossAmount Orders.OrdGrossAmount ...All numbers are called Number and not No, Num (Num usually meansnumber of) or whatever:Invoices.InvNumberCustomers.CustNumber (if there is a numeric customer number)Products.PrdUpcNumberCountries.CoIsoNumber (ISO 3166 numeric code)...Alphanumeric codes are (usually) named Code likeCountries.CoIsoCode (ISO 3166 alphanumeric code)Products.PrdCode (or Products.PrdSku)Customers.CustAccountCodeCurrencies.CurrIsoCode...Borderline cases are 3rd party, non-unique technical keys with businessmeaning like the UPS 1Z Tracking Number, which contains both digits andletters. Id call it UpsParcels.UpsP1zTrackingNumber, because the term is amatter of common knowledge and, technically spoken, 1Z even indicatesan alphanumeric value.The same goes for all common name components like description,remarks, name, quantity, price and so on, I guess youve got the idea.If possible, try to express the data type by attribute names, not only inattributes of the type date and logical. Url or Description indicate a single-line character field, LongDescriptions, Remarks or Notes usually getstored in large text fields, Percent, Amount and Price imply decimalvalues, NumberOf or PageNumber represent integers and so on.As for the visible parts of your model, there is not much more to say,except check your spelling before you save definitions and assign a help text
  6. 6. to each attribute. Besides the above mentioned object identifiers and one tomany relationships, you need a policy for many to manyrelationships too. Those are kind of technical classes, making complexrelationships persistent. Users will never see their names nor attributes, soyou may use geek speech. Here is a proven system: name those tablescomposing your unique table prefixes delimited by the digits 2 (to) and 4(for). If your customers can belong to different groups, the tablerepresenting the relationship customers [belonging] to customer groups isnamed Cust2CustGrp and contains only three keys:Cust2CustGrp.Cust2CustGrpOID [primary key]Cust2CustGrp.CustOID [foreign key]Cust2CustGrp.CustGrpOID [foreign key]To handle all customers of a group you codeFOR EACH Cust2CustGrp OF CustomerGroups, EACH Customer OF Cust2CustGrp: Do something.END.To get a list of all groups a customer belongs to you write:FOR EACH Cust2CustGrp OF Customers, EACH CustomerGroups of Cust2CustGrp: Do something.END.In some rare cases these prevailing technical classes have other attributes.Pragmatically, here Id go for an descriptive table label and stick with thegeeky table name. Actually, most probably those attributes are simpleconnections, keeping the table itself invisible to users. E.g. if youve a tablestoring Xmas present types, you could assign the type (or value) of presentsdepending on one of the groups assigned to your customers:XmasPresentTypes.XptOIDXmasPresentTypes.XptPostardOnlyXmasPresentTypes.XptPriceCust2CustGrp4Xpt.Cust2CustGrp4XptOID [primary key]Cust2CustGrp4Xpt.Cust2CustGrpOID [foreign key]Cust2CustGrp4Xpt.XptOID [foreign key]orCust2CustGrp4Xpt.Cust2CustGrp4XptOID [primary key]Cust2CustGrp4Xpt.CustOID [foreign key]Cust2CustGrp4Xpt.CustGrpOID [foreign key]Cust2CustGrp4Xpt.XptOID [foreign key]Pick whatever fits your needs best.Now lets come to another important rule: Separate all technical stufffrom your business logic. You cant avoid technical attributes in tablesrepresenting business objects, but you can and you should handle themseparately. For example you can assign values likeTable.PrefixOIDTable.PrefixUserLastUpdate (if you dont log user activities, probably you need to store thesedata on creation too)
  7. 7. Table.PrefixDateLastUpdateTable.PrefixTimeLastUpdateTable.PrefixIsActive || Table.PrefixIsDeletedin database triggers. Be aware that in n-tier architectures database triggersusually do not know the user. If you need to log user activities, you canimplement this feature in your key wrapping widgets. Since yourtechnical primary keys cant be used in user interfaces, you create a keywrapping widget for each primary key. This widget knows the invisibleprimary key and enables the user to choose or enter one or more attributeswith business meaning, which can be used to identify an object. Looking at adata viewer, those widgets appear just like fill-in fields with search button orcombo boxes. In the background they pass values of technical keys as wellas screen values of their visible attributes with business meaning to anapplication server, or another process handling your persistent objects.Back to logging. Since every data viewer must contain at least one keywrapping widget (one handling the primary key and probably a few othershandling foreign keys), you can determine the current user here. Just passanother hidden value to your persistence handler. Then in the databasetrigger you compare the old and new buffer, logging changes only. With aProgress® database, you can fully automate user activity logging usinggenerated includes in write triggers, made up by a tool accessing the virtualsystem tables (VST). By the way, you should assign values to primary keysin create triggers only. At this point, recap another important rule on stateof the art software design: Do not put any business logic into the userinterface code. Think SOA and encapsulate technical services as well asaudit trail requirements.Another rule of thumb is: Do not delete physically. Admitted deletions aretechnically possible, they are way too expensive, not really necessary andfurthermore you destroy information which as a rule you will need someday. Deleting logically on the other hand perfectly keeps your referentialintegrity, and it is way faster because your database servers update just onecolumn in a parent table, instead of bothering with often almost endlesscascading deletes along with RI checks. Adding a WHERE clause [NOT]ParentTable.PrefixIsDeleted, or, much better, [NOT]ParentTable.PrefixIsActive is cheap in comparison with all the nastyside effects of physical deletion. Tell your delete button to set a logicalattribute isDeleted to true, or even dump the button and use a check boxinstead, which allows your users to reactivate inactive objects.Large projects can easily exceed the physical limits set by your databasesystem. If you deal with very large amounts of data in particular entities,ensure that primary keys of (physically sliced) mega entities are never used
  8. 8. as foreign key in other tables. Only the (logical) mega table keepsknowledge about relations to other entities. That should not lead toproblems, because these entities are usually children of others (forexamplesales transactions of sales slips of POS terminals of shops).Implement a smart data access layer handling the requests from higherapplication levels. Depending on key value ranges and/or date-timeattributes, the data access layer can determine in which table a requestedtuple is located and in which table a new tuple must be stored, while fromthe higher levels perspective this conglomerate of tables comes into view asone logical table.The next warning has, like the two rules above, the potential for a bunch ofarticles: Avoid array fields. Most persistent arrays I saw, were the work oflazy code monkeys who werent capable to look a step further. Althoughsome database systems like Progress® can handle array fields, mostdatabase systems do not (why should they support tables in tables fordatabase designers not able to normalize properly?). Furthermore, lots offront ends and underlying components as well as development tools will nothandle extended attributes. Migrating applications its hard enough to handlethese constructs in settled (legacy) databases, so dont create new troubles.As for Progress® word indexes, which work like a charm with characterarrays, there is an alternative compatible with other databases. Just add aword indexed large text field and populate it with a string of the attributes inquestion in your write trigger.Modelers and developers following the relational theory as set in stone mostpossible will be offended by some of the code examples above. In formerparadigms it was -politely expressed- not the best practice to use syntaxlike ChildTable OF ParentTable, because (using attributes with businessmeaning as primary and foreign keys) it was not obvious which attributepair got used to join the objects. However, we got rid of that incrediblestupid concept in the meantime. OF has evident advantages:A clean database design provided, those misunderstandings caused byommission cannot occur, because each and every join uses a single pair ofindexed technical keys in both tables. The technical implementation ofrelationships has nothing to do with business logic, thus the consistentusage of OF increases code readability. Actually, technical attributes shouldnot appear in any code handling business logic (exceptionslike Table.PrefixIsActive, standing for not logically deleted, and othertechnical attributes with at least a portion of business meaning admitted).If OF fails, you have a technical problem like a missing index on a foreignkey column or (indexed) attribute names are equal in both tables, which
  9. 9. both must not happen. Fortunately the compiler will quit with an errormessage in this case. That means, the consistent usage of OF followed bya WHERE clause expressing business logic by testing attributes with businessmeaning, prevents you from logical errors as well as errors and ommissionsin the physical database design.As I said in the beginning, my intention was not to write a book explainingeach and every aspect of database design. Most probably thats impossible,because different business requirements do need different solutions. I wrotethis article off the top of my head on a rainy Saturday afternoon, so pleasedont expect completeness. And since I make a living with IT consulting,youll agree that it would be a bad idea to publish all my business secrets ;)Author: SebastianPublished: December 2004 LastUpdate: May 2005