Sunteți pe pagina 1din 11

Best

Practices: SQL

Best Practices: Structured


Query Language (SQL)

Authored by: Anthony B. Smoak


Website: anthonysmoak.com
Twitter: @anthonysmoak

April 26th, 2015




[Company Address]

Best Practices: SQL


Table of Contents
1. Introduction ....................................................................................................................... 3
2. Best Practices ..................................................................................................................... 3
2.1 Think in Sets not Rows ...................................................................................................... 3
2.1 Proper Use of Indexes for Query Performance .................................................................. 5
3. Assessment ........................................................................................................................ 6
4. Conclusion .......................................................................................................................... 7
5. Glossary ............................................................................................................................. 9
6. Bibliography ...................................................................................................................... 11

Best Practices: SQL

1. Introduction

Structured Query Language, better known as SQL, is regarded as the working language of

relational database management systems (RDBMS). As was the case with the relational model
and the concepts of normalization, the language developed as result of IBM research in the
nineteen seventies. Left to their own devices, the early RDBMSs (sic) implemented a number
of languages, including SEQUEL, developed by Donald D. Chamberlin and Raymond F. Boyce in
the early 1970s while working at IBM; and QUEL, the original language of Ingres. Eventually
these efforts converged into a workable SQL, the Structured Query Language (Kriegel, 2001).

For information professionals and database practitioners, SQL is regarded as a

foundational skill that enables raw data to be manipulated within a RDBMS. This is a
declarative type of language. It instructs the database about what you want to do, and leaves
details of implementation (how to do it) to the RDBMS itself (Kriegel, 2001).

Before the advent of commercially accessible databases, data was typically stored in a

proprietary file format manner. Each vendor had detailed specific access mechanisms, which
could not be easily configured and customized for access by alternate applications. As
databases began to adopt the relational model, the arrival and eventual standardization of SQL
by ANSI (American National Standards Institute) and ISO (International Standards Institute)
helped foster access, manipulation and retrieval consistency across many products.

Although there are many best practices that can be implemented with respect to crafting

and utilizing efficient SQL, this paper will focus upon two main practices. The first best practice
regards utilizing SQL as a set based language and generally avoiding row-by-row cursor activity
unless absolutely necessary. The second SQL best practice is to employ an effective indexing
strategy with regard to underlying RDBMS base tables to improve query performance and
efficiency.

2. Best Practices
2.1 Think in Sets Not Rows

SQL provides users the ability to query and manipulate data within the RDBMS without
having to solely rely on a graphical user interface. There are many powerful extensions in the

Best Practices: SQL


many variant structured query languages (e.g. T-SQL, DB2, PL/SQL, etc.) that provide
functionality above and beyond ISO and ANSI standards. While it can be of value to take
advantage of the flexibility and capabilities that additional SQL functionalities provide, yielding
to temptation and habitually deviating away from the basic principle of SQL must not occur.
SQL practitioners must first and foremost remember that SQL is a SET BASED construct.
Practitioners must think of table data as a whole and refrain from invariably manipulating
individual row elements one at a time.

Thinking in sets, or more precisely, in relational terms, is probably the most


important best practice when writing T-SQL code. Many people start coding in
T-SQL after having some background in procedural programming. Often, at
least at the early stages of coding in this new environment, you don't really
think in relational terms, but rather in procedural terms. That's because it's
easier to think of a new language as an extension to what you already know
as opposed to thinking of it as a different thing, which requires adopting the
correct mindset (Ben-Gan, 2012).

Devising set based solutions should be the default mindset when working with SQL. Working
with a relational language based upon the relational data model demands a set based mindset.
Iterative cursor based processing, if used, should be used sparingly. By preferring a
cursor-based (row-at-a-time) result setor as Jeff Moden has so aptly termed it, Row By
Agonizing Row (RBAR; pronounced ree-bar)instead of a regular set-based SQL query, you
add a large amount of overhead to SQL Server (Fritchey, 2014). Ben-Gan (2012) highlights the
deficiencies of using T-SQL within an iterative construct such as a WHILE loop. He performed a
T-SQL WHILE loop of 1 million iterations on his laptop, which took 100 seconds to complete. He
then subsequently performed a similar WHILE loop on the same laptop in C# (an object
oriented, procedural language). The completion time in C# measured 10 seconds; this degree of
change with respect to 100 seconds is an order of magnitude in difference. There is an immense
overhead that is inherent in SQL row based processing that does not manifest itself when set
based processing is applied.
If all other set based options have been exhausted and a row-by-row cursor must be
employed, then make sure to use an efficient (relatively speaking) cursor type. The fastforward only cursor type provides some performance advantages with respect to other cursor
types in a SQL server environment. Fast forward cursors are read only and they only move

Best Practices: SQL


forward within a data set (i.e. they do not support multi-direction iteration). Furthermore,
according to Microsoft Technet (2015), fast forward only cursors automatically close when they
reach the end of the data. The application driver does not have to send a close request to the
server, which saves a roundtrip across the network.

2.1 Proper Use of Indexes for Query Performance


Another important best practice for optimal SQL query performance is the proper use of
indexes on underlying base tables. A poor indexing strategy can counteract the gains of the best
hardware and server architectures. Indexing is an implementation detail outside of core
standardized ANSI or ISO SQL, as many SQL variants have differing commands to create indexes.
However, the impact indexes provide to query performance cannot be understated. You can
obtain the greatest improvement in database application performance by looking first at the
area of data access, including logical/physical database design, query design, and index design
(Fritchey, 2014). Fritchey (2014) also asserts that a missing index or a misplaced index, can be
the basis for all performance problems starting with basic data access, continuing through joins,
and ending in filtering clauses. The proper understanding and usage of indexes should be the
aim of all SQL practitioners. Proper index usage should not be minimized and treated as an
esoteric dark art best left to DBAs toiling in the basement.
From an indexing best practice standpoint to support query performance, it is advisable
to create indexes on fields that are frequently referenced in SQL WHERE clauses. Kriegel (2011)
asserts, Not all indices are created equal If the column for which you've created an index is
not part of your search criteria, the index will be useless at best and detrimental at worst.
It is important that regularly used, resource intensive queries be subjected to covering
indexes. The aim of a covering index is to cover the query by including all of the fields that
are referenced in WHERE or SELECT statements. Babbar, Bjeletich, Mackman, Meier and
Vasireddy (2004) state, The index covers the query, and can completely service the query
without going to the base data. This is in effect a materialized view of the query. The covering
index performs well because the data is in one place and in the required order. The benefit of
a properly constructed covering index is clear; the RDBMS can find all the data columns it needs
in the index without the need to refer back to the base table. This action is beneficial for

Best Practices: SQL


performance. Covered indexes are the closest youll get to having multiple clustered indexes
on the same table (Henderson, 2000).

An additional best practice with regard to SQL performance tuning is to apply non-

clustered indexes on foreign keys within frequently accessed tables. In addition, at a minimum,
all tables should have a clustered index applied so as to avoid expensive table scans by the
query optimizer.
When no clustered index is present to establish a storage order for the data,
the storage engine will simply read through the entire table to find what it
needs. A table without a clustered index is called a heap table. A heap is just
an unordered stack of data with a row identifier as a pointer to the storage
location. This data is not ordered or searchable except by walking through
the data, row by row, in a process called a scan(Fritchey, 2014).

Kriegel (2011) offers the same assessment when he advises to create one clustered index per
table, usually on the PRIMARY KEY column. He also advocates creating indexes on any
foreign key in your table.
Babbar et al. (2004) concur with the aforementioned sentiments. They advise, Be sure
to create an index on any foreign key. Because foreign keys are used in joins, foreign keys
almost always benefit from having an index. Babbar et al. (2204) also state that every table
should have a clustered index unless there is an apparent performance related reason not to
include one.

3. Assessment

As mentioned previously in this paper, one SQL best practice involves employing a

covering index to aid query performance for problem queries. The covering index is a nonclustered composite index that contains all of the fields referenced in a SQL SELECT or WHERE
clause. Korotkevitch (2014) offers a caveat to this approach and states that, even though such
queries can be optimized with covering nonclustered indexes, it is not always the ideal solution.
In some cases, it requires you to create very wide nonclustered indexes, which will use up a lot
of storage space on disk and in the buffer pool.

This paper also advocates applying one clustered index on all tables. Tables without a

clustered index are referred to as a heap table. Heap tables are a collection of unordered

Best Practices: SQL


rows that are very expensive for the RDBMS query optimizer to scan for desired data. It should
be noted that the caveat to applying clustered indexes on primary keys within a transaction
table is that the index must be reordered after every INSERT or UPDATE to the key. It is best
practice to apply the clustered index on a primary key within dimension or static tables; these
tables are only used for join purposes and are optimal for this indexing strategy.
Although the indexing strategies presented in this paper are generally regarded as best
practices, it must be remembered that indexing is considered an art and not a science.
Deviation from established principles often occur in practice to meet unique scenarios and
conditions. Diverse real world scenarios often call for different indexing strategies. In some
instances, indexing a table may not be required. If a table is small (on a per data page basis),
then a full table scan will be more efficient than processing an index and then subsequently
accessing the base table to locate the rest of the row data.

4. Conclusion

In order to maximize the efficiency of SQL, whether it is constructed for ad-hoc usage or

employed in a more formal application capacity, it is advisable to adhere to the best practices
highlighted in this paper for most scenarios. SQL practitioners should always think in terms of
set based operations first and avoid row-by-row cursor operations until all other set based
options have been exhausted. If a cursor must be used, construct a fast forward only cursor.
The fast forward only cursor is restricted to forward only movement within a dataset and is by
definition read only.

One of the biggest detriments to SQL query performance is an insufficient indexing

strategy. On one hand, under-indexing can potentially cause queries to run longer than
necessary due to the costly nature of table scans against unordered heaps. This scenario must
be counterbalanced by the tendency to over-index, which will negatively impact insert and
update performance. The best practice to aid the performance of problematic queries involves
the use of covering indexes. A covering index approach requires combining all of the fields used
in SQL SELECT and WHERE clauses (of a problematic query) into one non-clustered index. The
covering index aids performance by allowing the RDBMS to find all the data it needs without

Best Practices: SQL


having to make an additional read against the base table. Furthermore, non-clustered indexes
should be applied on all foreign keys that are present in frequently accessed tables.
SQL practitioners and DBAs must collaborate to understand query performance needs
as a whole. DBAs left to their own devices will create indexes without any knowledge of the
queries that will utilize those indexes. This lone wolf approach has the potential to render
indexes inefficient on arrival. Conversely, it is equally important that SQL practitioners
understand indexes as well. Placing SELECT * in every SQL query will negate the effectiveness
of covering indexes and add additional processing overhead vs. specifically listing the subset of
fields desired. Resulting code developed without an understanding of the indexes in play often
yield sub-standard performance.

Best Practices: SQL


5. Glossary
Base Table: A table in the relational data model containing the inserted raw data. Hoffer, J.,
Ramesh, V., & Topi, H. (2013)
C#: C# (pronounced C-Sharp) is a platform neutral object oriented programming language
developed by Microsoft.
Clustered Index: An index where the data in a table is physically ordered according to the index,
which results in faster performance. Physical data blocks are clustered together just as index
entries pointing to these blocks are. This significantly speeds retrieving the records as there is
no need to spin the disk to get to the needed data. Kriegel, A. (2011)
Cursor: A mechanism to work with one row at a time out of a multirow result set. Fritchey, G.
(2014)
Foreign Key: An attribute in a relationship that serves as the primary key of another relation in
the same database. Hoffer et al. (2013)
Heap Table: A table without a clustered index is called a heap table. The data rows of a heap
table are not stored in any particular order or linked to the adjacent pages in the table. This
unorganized structure of the heap table usually increases the overhead of accessing a large
heap table when compared to accessing a large nonheap table (a table with a clustered index).
Fritchey, G. (2014)
Non Clustered Index: In a nonclustered index, columns are selected and sorted based on their
values. These columns contain a reference to the clustered index or heap location of the data
they are related to. This is nearly identical to how a card catalog works in a library. The order of
the books, or the records in the tables, doesn't change, but a shortcut to the data is created
based on the other search values. Strate, J. & Krueger, T. (2012)
Procedural Programming Language: Procedural language is a type of computer programming
language that specifies a series of well-structured steps and procedures within its programming

Best Practices: SQL


context to compose a program. It contains a systematic order of statements, functions and
commands to complete a computational task or program. Technopedia
RDBMS: A database management system that manages data as a collection of tables in which
all data relationships are represented by common values in related tables. Hoffer et al. (2013)
SELECT *: A SQL statement that returns all of the rows in a given dataset (e.g. table, view, etc.)
WHILE Statement: The while statement is one of the control flow statements in C# that
enables the execution of a sequence of logic multiple times in a loop until a specific condition is
false. Technopedia

10

Best Practices: SQL


6. Bibliography
Babbar, A., Bjeletich, S., Mackman, A., Meier, J., & Vasireddy, S. (May, 2004). Improving .NET
Application Performance and Scalability. Retrieved from
https://msdn.microsoft.com/en-us/library/ff647793.aspx

Ben-Gan, I. (Apr, 2012). T-SQL Foundations: Thinking in Sets. Why this line of thought is
important when addressing querying tasks. Retrieved from http://sqlmag.com/t-sql/tsql-foundations-thinking-sets

Fritchey, Grant. ( 2014). Sql server query performance tuning (4th ed.). [Books24x7 version]
Available from
http://common.books24x7.com.libezproxy2.syr.edu/toc.aspx?bookid=72593.

Henderson, K. (2000). The Guru's Guide to Transact-SQL. Addison-Wesley Professional.

Hoffer, J., Ramesh, V., & Topi, H. (2013). Modern Database Management (11th ed.). Pearson.

Korotkevitch,D. (Jun, 2014). Pro SQL Server Internals. Apress.

Kriegel, Alex. ( 2011). Discovering sql: a hands-on guide for beginners. [Books24x7 version]
Available from
http://common.books24x7.com.libezproxy2.syr.edu/toc.aspx?bookid=41611.

Microsoft Technet. Fast Forward-Only Cursors (ODBC). Retrieved April 23, 2015, from
https://technet.microsoft.com/en-us/library/aa177106(v=sql.80).aspx

Procedural Language. In Technopedia. Retrieved April 23, 2015, from
http://www.techopedia.com/definition/8982/procedural-language

Strate, Jason & Krueger, Ted. ( 2012). Expert performance indexing for sql server 2012.
[Books24x7 version] Available from
http://common.books24x7.com.libezproxy2.syr.edu/toc.aspx?bookid=54065.

While Statement. In Technopedia. Retrieved April 23, 2015, from
http://www.techopedia.com/definition/25648/while-statement-c

11

S-ar putea să vă placă și