The motive for this presentation are developers and the attitude most of them have towards databases. No, this is not a presentation about venting. This is about tips and standards that application and front-end developers need to know when working with databases. This presentation is limited to only relational databases, and limited to content that I could present within 30 minutes.In my career I have seen and worked with databases that have been atrociously designed and programmed on. For instance, a table that contains a field that by all means is supposed to be a primary key. It is also named as such with a PK_ prefix and all, and it contains a lot of records, most of which have NULL for this field. I attribute this to developers who treat databases just as a dump for data. In the subsequent slides, we’ll take a look at certain issues that are common to databases and see how they can be addressed.
The presentation would take a look at a few things that developers could do right, and a few things that you could apply a best practices on.
Take a look at this picture. It depicts a table with data, and it’s schema definition. Try to identify as many things that are wrong here.
Here’s what I could garner from it. Take a closer look a the table definition and the sample set of data from the same table.The lack of constraintsImproperly implemented uniquenessInconsistent naming conventionsNon-Normalized table designCarelessly chosen data typesshow that the table is not well designed.NORMALIZATIONWhen you record data in a database you need to split the data up by entities. You don’t display eggs, cabbages, batteries and disinfectant cans in the same crate as carrots in the supermarket, do you? The same goes for the different entities. Each table is to record details of just one entity (or a single relationship among entities)KEYS and RECORD UNIQUENESSEach table should have a primary key (First Normal Form governs this). A primary key uniquely identifies a record or object.In each scenario or case study or problem domain that you map to a database system, there will be something that uniquely identifies each instance of each entity. This is called the natural key.A natural key is a field that uniquely identifies an instance of an entity (i.e. a record in a table) naturally, such as a Student Code or Customer Code that is used by the school or business to track a student or customer if there was no RDBMS database. In some scenarios it could be the NIC number or Social Security Number.There are rare situations when you cannot have a natural key. In such situations it is alright for you to have only the surrogate key.In all other cases it is strongly recommended that you include a natural key.NAMING CONVENTIONSNaming conventions are very important, regardless of how many developers think otherwise. Using a consistent and standard naming convention automatically documents your database (and objects) to a good level.Here are some tips:Try not to use prefixes (and suffixes) on object names. Rather than “tblCustomer” as a table name; “Customer” would make an ideal name. When all tables are named consistently as this users and developers will automatically know that the object is a table.Try not to use white spaces and underscores in object and column names. A good choice would be to start each word in an object name with a capital letter followed by simple characters: “Customer” , “GetProduct”, “CalculateLateCharges” etc.CONSTRAINTS Constraints are very important too. They help with restricting values that can be stored. This is important such that invalid and wrong data from being stored in the table. A lot of developers make the mistake of letting only the application perform all validations and data restrictions. This of course will not stop data from not being validated when data is stored through another pathway, such as an ad-hoc bulk insert from an Excel file. DATA TYPESData type considerations are important such it helps you with space savings. Saving on 2 bytes from three columns on a table will eventually help you save around 57MB when you go upto 10,000,000 records. How about 10 tables of that nature and so forth… You would be saving a lot. Try to use the optimum data type for each field. A char(10) field for phone number has lesser overhead than varchar(10). A field that stores English language names would save on space if it’s VARCHAR instead of NVARCHAR. Do not forget the 8060 rule when creating tables: (http://msdn.microsoft.com/en-us/library/ms186981(v=sql.105).aspx).
The tblCustomer table can be re-created like this. Normalized and all.Yes, it looks complicated. But as you read through it for 10 minutes you would notice that it is granular and easy to understand. Each entity is in its own structure.Exercise: Try to see if you can come up with an alternative design for the CustomerAddress table or any improvements for it.
The code that you write, say stored procedures, functions and triggers, can be iteratively made better, such as when you follow an agile approach. However, unlike your application code, a database design is harder to implement in an iterative way, especially if the design needs to be broken and rebuilt. Sure, adding to the schema would be a way forward, but breaking and building: nah-ah!Try to make sure that your database design is robust and well thought out when designing (in agile or traditional means) – You don’t want to break the data do you?See what Brent Ozar has to say, when he says “code is agile, and databases are brittle” in http://www.brentozar.com/archive/2013/10/what-developers-need-to-know-about-sql-server/
SQL Code (or any code for that matter) can always break. We just think that they won’t because they pass a few test cases (and they are not even proper test cases, but checks that you run a few times after developing your piece of code). But when that unanticipated parameter is passed or when the code is run on a server with different settings, that’s when your code and break apart.Hence, it is important that your code is written defensively so that these undesirable effects are minimized.When you write code you make assumptions. It is important that these assumptions are explicitly defined and then understood.explicitly list the assumptions that have been madeensure that the these assumptions always holdsystematically remove assumptions that are not essential, or are Incorrect.Make your code modularized as possible, and testable as a unit, and fully testedWhen testing ensure that the code is tested for every type of use case possible, no matter how improbable they are.Reuse the modularized code whenever possible. But make sure chunks out of your code modules are not re-used (since these are not testes as modules and certainly could fail)For more information and to better write resilient T-SQL code, check out this e-book: http://www.red-gate.com/community/books/defensive-database-programming
And finally, documentingDocumenting of code is very important, in that not just others after, you but yourself after 5 months know what the code actually does.More often than not SQL code is written without proper refactoring of the text. This is in part due to the IDEs unlike what you find IDEs like Visual Studio do not provide much support for beautifying the code.Refactored code in itself a first step in documenting your code. For instance one would easily be able to read through a multi-level sub-query call or a loop when it is refactored.Commenting what each piece of code does wonders for the reader in understanding what your code does, and commenting why your code is written would make it drive in the idea of your code much deeper.Sure it takes more time documenting your code, but just like everything else it is worth it in the long run.
The Developer vs. The Relational Database: How to do it right
The Relational Database
Do it right:
What’s wrong with this picture
1. Define & understand assumptions
2. Modularize into fully testable and fully tested code
3. Test as many use cases as possible
4. Reuse code when feasible*
Indexing for performance
A clustered index on a heap (table) is a good idea
Unique, narrow, static field
Leave small tables as is
Non-clustered indexes on fields used in
JOINs and WHERE predicates
Do not index every field
Index based on usage
Consider included indexes
Consider filtered indexes
Use utility objects
The Numbers table, for set based:
Extraction of distinct characters
Generate time ranges or time slices
Even something trivial like
Find IDENTITY gaps
generating an IP range
Document your code
Do we have time for a demo?