Documente Academic
Documente Profesional
Documente Cultură
DB Coding Standards
Version 1.0
12/Nov/2009
Farukh Zahoor
Page
1
1. Table names should correspond to the entities themselves, e.g. a list of products shall be
saved in the table named 'Product' (singular)
2. Configurable entities should be post-fixed with the term 'Config' (for SQL Server) or '_config'
(for MySQL)
4. For SQL Server, each table must have a primary key of type GUID (SQL Server 2005 or later,
use INTEGER AUTO_INCREMENT for SQL Server 2000 or earlier). For MySQL each table
must have a primary key of type INTEGER AUTO_INCREMENT.
5. For MySQL table types must have storage engine of INNO_DB for efficient storage.
6. Each development machine must have its own local database installation to avoid conflict
and data corruption.
7. If a component module in the code has its own tables, e.g. if a 3rd party or external user
management system is being used, the external tables should have a fixed prefix, such as
'UM_' (SQL Server) or 'um_' to separate them from the core database.
9. All types must be stored in the database. E.g. ProductType. A unique identifier for these
tables is required, but for .NET it doesn't have to be a GUID.
• For tables use singular keyword = User, Product, Service, Supplier, Customer
• For field name = ID (as uniqueidentifier in sql server and auto incremental id in
mysql), Name, Class etc
• Define proper keys on the tables and make the entity diagrams by relating them
11. Make sure you normalize your data at least till 3rd normal form. At the same time, do not
compromise on query performance. A little bit of renormalizations helps queries perform
faster.
Page
2
12. Write comments in your stored procedures, triggers and SQL batches generously, whenever
something is not very obvious. This helps other programmers understand your code clearly.
Don't worry about the length of the comments, as it won't impact the performance, unlike
interpreted languages like ASP 2.0.
13. Do not use SELECT * in your queries. Always write the required column names after the
SELECT statement, like SELECT ID, FirstName, City. This technique results in less disk IO
and less network traffic and hence better performance.
14. Try to avoid server side cursors as much as possible. Always stick to 'set based approach'
instead of a 'procedural approach' for accessing/manipulating data. Cursors can be easily
avoided by SELECT statements in many cases. If a cursor is unavoidable, use a simple
WHILE loop instead, to loop through the table. I personally tested and concluded that a
WHILE loop is faster than a cursor most of the times. But for a WHILE loop to replace a
cursor you need a column (primary key or unique key) to identify each row uniquely and I
personally believe every table must have a primary or unique key
15. Avoid the creation of temporary tables while processing data, as much as possible, as
creating a temporary table means more disk IO. Keep in mind that, in some cases, using a
temporary table performs better than a highly complicated query.
16. Avoid wildcard characters at the beginning of a word while searching using the LIKE keyword,
as that results in an index scan, which is defeating the purpose of having an index. The
following statement results in an index scan, while the second statement results in an index
seek:
17. Use 'Derived tables' wherever possible, as they perform better. Consider the following query
to find the second highest salary from Employees table:
SELECT MIN(Salary)
FROM Employees
WHERE EmpID IN
(
SELECT TOP 2 EmpID
FROM Employees
ORDER BY Salary Desc
)
The same query can be re-written using a derived table as shown below, and it performs
twice as fast as the above query:
SELECT MIN(Salary)
FROM
(
SELECT TOP 2 Salary
FROM Employees
ORDER BY Salary Desc
) AS A
Page
3
18. While designing your database, design it keeping 'performance' in mind. You can't really tune
performance later, when your database is in production, as it involves rebuilding
tables/indexes, re-writing queries. Make sure your queries do 'Index seeks' instead of 'Index
scans' or 'Table scans'. A table scan or an index scan is a very bad thing and should be
avoided where possible (sometimes when the table is too small or when the whole table
needs to be processed, the optimizer will choose a table or index scan).
19. Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI
joins the WHERE clause is used only for filtering data. Where as with older style joins, the
WHERE clause handles both the join condition and filtering data. The first of the following two
queries shows an old style join, while the second one shows the new ANSI join syntax:
Be aware that the old style *= and =* left and right outer join syntax may not be supported in a
future releases of databases, so you are better off adopting the ANSI standard outer join
syntax.
20. Views are generally used to show specific data to specific users based on their interest.
Views are also used to restrict access to the base tables by granting permission on only
views. Yet another significant use of views is that, they simplify your queries. Incorporate your
frequently required complicated joins and calculations into a view, so that you don't have to
repeat those joins/calculations in all your queries, instead just select from the view.
21. Try not to let your front-end applications query/manipulate the data directly using SELECT or
INSERT/UPDATE/DELETE statements. Instead, create stored procedures, and let your
applications access these stored procedures. This keeps the data access clean and
consistent across all the modules of your application, at the same time centralizing the
business logic within the database. Its optional to adopt depending upon architecture defined.
22. If you have a choice, do not store binary files, image files (Binary large objects or BLOBs) etc.
inside the database. Instead store the path to the binary/image file in the database and use
that as a pointer to the actual binary file. Retrieving, manipulating these large binary files is
better performed outside the database and after all, database is not meant for storing files.
23. Use char data type for a column, only when the column is non-nullable and have fixed length.
If a char column is nullable, it is treated as a fixed length column. So, a char(100), when
Page
4
NULL, will eat up 100 bytes, resulting in space wastage. So, use varchar(100) in this
situation. Of course, variable length columns do have a very little processing overhead over
fixed length columns. Carefully choose between char and varchar depending up on the length
of the data you are going to store.
24. Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than
static SQL. IF and CASE statements come in handy to avoid dynamic SQL. Another major
disadvantage of using dynamic SQL is that, it requires the users to have direct access
permissions on all accessed objects like tables and views. Generally, users are given access
to the stored procedures which reference the tables, but not directly on the tables. In this
case, dynamic SQL will not work.
25. Consider the following drawbacks before using Auto Incremental ID for generating primary
keys. Auto Incremental ID is very much specific for different databases, and you will have
problems if you want to support different database backends for your application. Auto
Incremental ID columns have other inherent problems. Auto Incremental ID columns run out
of numbers one day or the other. Numbers can't be reused automatically, after deleting rows.
Replication and Auto Incremental ID columns don't always get along well. So, come up with
an algorithm to generate a primary key, in the front-end or from within the inserting stored
procedure. There could be issues with generating your own primary keys too, like
concurrency while generating the key, running out of values. So, consider both the options
and go with the one that suits you well.
26. Minimize the usage of NULLs, as they often confuse the front-end applications, unless the
applications are coded intelligently to eliminate NULLs or convert the NULLs into some other
form. Any expression that deals with NULL results in a NULL output. ISNULL and
COALESCE functions are helpful in dealing with NULL values. Here's an example that
explains the problem:
Consider the following table, Customers which stores the names of the customers and the
middle name can be NULL.
Now insert a customer into the table whose name is Tony Blair, without a middle name:
The following SELECT statements returns NULL, instead of the customer name:
SELECT FirstName + ' ' + MiddleName + ' ' + LastName FROM Customers
OR
SELECT Concat(FirstName , ' ' , MiddleName , ' ' , LastName) as FullName FROM
Customers
Page
5
SELECT FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName FROM Customers
27. Use Unicode datatypes like nchar, nvarchar, ntext in SQL server and utf8 character set in
MySql, if your database is going to store not just plain English characters, but a variety of
characters used all over the world. Use these datatypes, only when they are absolutely
needed as they need twice as much space as non-unicode datatypes.
28. Always use a column list in your INSERT statements. This helps in avoiding problems when
the table structure changes (like adding a column). Here's an example which shows the
problem.
Now run the above INSERT statement. You get the error
This problem can be avoided by writing an INSERT statement with a column list as shown
below:
29. Perform all your referential integrity checks, data validations using constraints (foreign key
and check constraints). These constraints are faster than triggers. So, use triggers only for
auditing, custom tasks and validations that can not be performed using these constraints.
These constraints save you time as well, as you don't have to write code for these validations
and the RDBMS will do all the work for you.
30. Always access tables in the same order in all your stored procedures/triggers consistently.
This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep
your transactions as short as possible. Touch as less data as possible during a transaction.
Never, ever wait for user input or give messages in the middle of a transaction. Do not use
higher level locking hints or restrictive isolation levels unless they are absolutely needed.
Make your front-end applications deadlock-intelligent, that is, these applications should be
able to resubmit the transaction incase the previous transaction fails with error 1205. In your
applications, process all the results returned by database immediately, so that the locks on
the processed rows are released, hence no blocking.
Page
6
31. Offload tasks like string manipulations, concatenations, row numbering, case conversions,
type conversions etc. to the front-end applications, if these operations are going to consume
more CPU cycles on the database server (It's okay to do simple string manipulations on the
database end though). Also try to do basic validations in the front-end itself during data entry.
This saves unnecessary network roundtrips.
32. If back-end portability is your concern, stay away from bit manipulations with T-SQL, as this is
very much RDBMS specific. Further, using bitmaps to represent different states of a particular
entity conflicts with the normalization rules.
33. To make SQL Statements more readable, start each clause on a new line and indent when
needed. Following is an example:
34. Though we survived the Y2K, always store 4 digit years in dates (especially, when using char
or int datatype columns), instead of 2 digit years to avoid any confusion and problems. This is
not a problem with datetime columns, as the century is stored even if you specify a 2 digit
year. But it's always a good practice to specify 4 digit years even with datetime datatype
columns.
35. In your queries and other SQL statements, always represent date in yyyy/mm/dd format. This
format will always be interpreted correctly, no matter what the default date format on the
database is. This also prevents the following error, while working with dates:
The conversion of a char data type to a datetime data type resulted in an out-of-range
datetime value.
37. Always be consistent with the usage of case in your code. On a case insensitive server, your
code might work fine, but it will fail on a case sensitive database if your code is not consistent
in case. For example, if you create a table in SQL Server or database that has a case-
sensitive or binary sort order, all references to the table must use the same case that was
specified in the CREATE TABLE statement. If you name the table as 'MyTable' in the
CREATE TABLE statement and use 'mytable' in the SELECT statement, you get an 'object
not found' or 'invalid object name' error.
38. Do not use the column numbers in the ORDER BY clause as it impairs the readability of the
SQL statement. Further, changing the order of columns in the SELECT list has no impact on
the ORDER BY when the columns are referred by names instead of numbers. Consider the
following example, in which the second query is more readable than the first one:
Page
7
FROM Orders
ORDER BY OrderDate
It is not necessary view1 and view2 fields will always be consistent. If we will add any new
field in any view that will create column mismatch error. So always write field names.
Page
8