Documente Academic
Documente Profesional
Documente Cultură
Practices: SQL
Table of Contents
1. Introduction ....................................................................................................................... 3
2. Best Practices ..................................................................................................................... 3
2.1 Think in Sets not Rows ...................................................................................................... 3
2.1 Proper Use of Indexes for Query Performance .................................................................. 5
3. Assessment ........................................................................................................................ 6
4. Conclusion .......................................................................................................................... 7
5. Glossary ............................................................................................................................. 9
6. Bibliography ...................................................................................................................... 11
1. Introduction
Structured Query Language, better known as SQL, is regarded as the working language of
relational database management systems (RDBMS). As was the case with the relational model
and the concepts of normalization, the language developed as result of IBM research in the
nineteen seventies. Left to their own devices, the early RDBMSs (sic) implemented a number
of languages, including SEQUEL, developed by Donald D. Chamberlin and Raymond F. Boyce in
the early 1970s while working at IBM; and QUEL, the original language of Ingres. Eventually
these efforts converged into a workable SQL, the Structured Query Language (Kriegel, 2001).
foundational skill that enables raw data to be manipulated within a RDBMS. This is a
declarative type of language. It instructs the database about what you want to do, and leaves
details of implementation (how to do it) to the RDBMS itself (Kriegel, 2001).
Before the advent of commercially accessible databases, data was typically stored in a
proprietary file format manner. Each vendor had detailed specific access mechanisms, which
could not be easily configured and customized for access by alternate applications. As
databases began to adopt the relational model, the arrival and eventual standardization of SQL
by ANSI (American National Standards Institute) and ISO (International Standards Institute)
helped foster access, manipulation and retrieval consistency across many products.
Although there are many best practices that can be implemented with respect to crafting
and utilizing efficient SQL, this paper will focus upon two main practices. The first best practice
regards utilizing SQL as a set based language and generally avoiding row-by-row cursor activity
unless absolutely necessary. The second SQL best practice is to employ an effective indexing
strategy with regard to underlying RDBMS base tables to improve query performance and
efficiency.
2. Best Practices
2.1 Think in Sets Not Rows
SQL provides users the ability to query and manipulate data within the RDBMS without
having to solely rely on a graphical user interface. There are many powerful extensions in the
Devising set based solutions should be the default mindset when working with SQL. Working
with a relational language based upon the relational data model demands a set based mindset.
Iterative cursor based processing, if used, should be used sparingly. By preferring a
cursor-based (row-at-a-time) result setor as Jeff Moden has so aptly termed it, Row By
Agonizing Row (RBAR; pronounced ree-bar)instead of a regular set-based SQL query, you
add a large amount of overhead to SQL Server (Fritchey, 2014). Ben-Gan (2012) highlights the
deficiencies of using T-SQL within an iterative construct such as a WHILE loop. He performed a
T-SQL WHILE loop of 1 million iterations on his laptop, which took 100 seconds to complete. He
then subsequently performed a similar WHILE loop on the same laptop in C# (an object
oriented, procedural language). The completion time in C# measured 10 seconds; this degree of
change with respect to 100 seconds is an order of magnitude in difference. There is an immense
overhead that is inherent in SQL row based processing that does not manifest itself when set
based processing is applied.
If all other set based options have been exhausted and a row-by-row cursor must be
employed, then make sure to use an efficient (relatively speaking) cursor type. The fastforward only cursor type provides some performance advantages with respect to other cursor
types in a SQL server environment. Fast forward cursors are read only and they only move
An additional best practice with regard to SQL performance tuning is to apply non-
clustered indexes on foreign keys within frequently accessed tables. In addition, at a minimum,
all tables should have a clustered index applied so as to avoid expensive table scans by the
query optimizer.
When no clustered index is present to establish a storage order for the data,
the storage engine will simply read through the entire table to find what it
needs. A table without a clustered index is called a heap table. A heap is just
an unordered stack of data with a row identifier as a pointer to the storage
location. This data is not ordered or searchable except by walking through
the data, row by row, in a process called a scan(Fritchey, 2014).
Kriegel (2011) offers the same assessment when he advises to create one clustered index per
table, usually on the PRIMARY KEY column. He also advocates creating indexes on any
foreign key in your table.
Babbar et al. (2004) concur with the aforementioned sentiments. They advise, Be sure
to create an index on any foreign key. Because foreign keys are used in joins, foreign keys
almost always benefit from having an index. Babbar et al. (2204) also state that every table
should have a clustered index unless there is an apparent performance related reason not to
include one.
3. Assessment
As mentioned previously in this paper, one SQL best practice involves employing a
covering index to aid query performance for problem queries. The covering index is a nonclustered composite index that contains all of the fields referenced in a SQL SELECT or WHERE
clause. Korotkevitch (2014) offers a caveat to this approach and states that, even though such
queries can be optimized with covering nonclustered indexes, it is not always the ideal solution.
In some cases, it requires you to create very wide nonclustered indexes, which will use up a lot
of storage space on disk and in the buffer pool.
This paper also advocates applying one clustered index on all tables. Tables without a
clustered index are referred to as a heap table. Heap tables are a collection of unordered
4. Conclusion
In order to maximize the efficiency of SQL, whether it is constructed for ad-hoc usage or
employed in a more formal application capacity, it is advisable to adhere to the best practices
highlighted in this paper for most scenarios. SQL practitioners should always think in terms of
set based operations first and avoid row-by-row cursor operations until all other set based
options have been exhausted. If a cursor must be used, construct a fast forward only cursor.
The fast forward only cursor is restricted to forward only movement within a dataset and is by
definition read only.
strategy. On one hand, under-indexing can potentially cause queries to run longer than
necessary due to the costly nature of table scans against unordered heaps. This scenario must
be counterbalanced by the tendency to over-index, which will negatively impact insert and
update performance. The best practice to aid the performance of problematic queries involves
the use of covering indexes. A covering index approach requires combining all of the fields used
in SQL SELECT and WHERE clauses (of a problematic query) into one non-clustered index. The
covering index aids performance by allowing the RDBMS to find all the data it needs without
5. Glossary
Base Table: A table in the relational data model containing the inserted raw data. Hoffer, J.,
Ramesh, V., & Topi, H. (2013)
C#: C# (pronounced C-Sharp) is a platform neutral object oriented programming language
developed by Microsoft.
Clustered Index: An index where the data in a table is physically ordered according to the index,
which results in faster performance. Physical data blocks are clustered together just as index
entries pointing to these blocks are. This significantly speeds retrieving the records as there is
no need to spin the disk to get to the needed data. Kriegel, A. (2011)
Cursor: A mechanism to work with one row at a time out of a multirow result set. Fritchey, G.
(2014)
Foreign Key: An attribute in a relationship that serves as the primary key of another relation in
the same database. Hoffer et al. (2013)
Heap Table: A table without a clustered index is called a heap table. The data rows of a heap
table are not stored in any particular order or linked to the adjacent pages in the table. This
unorganized structure of the heap table usually increases the overhead of accessing a large
heap table when compared to accessing a large nonheap table (a table with a clustered index).
Fritchey, G. (2014)
Non Clustered Index: In a nonclustered index, columns are selected and sorted based on their
values. These columns contain a reference to the clustered index or heap location of the data
they are related to. This is nearly identical to how a card catalog works in a library. The order of
the books, or the records in the tables, doesn't change, but a shortcut to the data is created
based on the other search values. Strate, J. & Krueger, T. (2012)
Procedural Programming Language: Procedural language is a type of computer programming
language that specifies a series of well-structured steps and procedures within its programming
10
6. Bibliography
Babbar, A., Bjeletich, S., Mackman, A., Meier, J., & Vasireddy, S. (May, 2004). Improving .NET
Application Performance and Scalability. Retrieved from
https://msdn.microsoft.com/en-us/library/ff647793.aspx
Ben-Gan, I. (Apr, 2012). T-SQL Foundations: Thinking in Sets. Why this line of thought is
important when addressing querying tasks. Retrieved from http://sqlmag.com/t-sql/tsql-foundations-thinking-sets
Fritchey, Grant. ( 2014). Sql server query performance tuning (4th ed.). [Books24x7 version]
Available from
http://common.books24x7.com.libezproxy2.syr.edu/toc.aspx?bookid=72593.
Henderson, K. (2000). The Guru's Guide to Transact-SQL. Addison-Wesley Professional.
Hoffer, J., Ramesh, V., & Topi, H. (2013). Modern Database Management (11th ed.). Pearson.
Korotkevitch,D. (Jun, 2014). Pro SQL Server Internals. Apress.
Kriegel, Alex. ( 2011). Discovering sql: a hands-on guide for beginners. [Books24x7 version]
Available from
http://common.books24x7.com.libezproxy2.syr.edu/toc.aspx?bookid=41611.
Microsoft Technet. Fast Forward-Only Cursors (ODBC). Retrieved April 23, 2015, from
https://technet.microsoft.com/en-us/library/aa177106(v=sql.80).aspx
Procedural Language. In Technopedia. Retrieved April 23, 2015, from
http://www.techopedia.com/definition/8982/procedural-language
Strate, Jason & Krueger, Ted. ( 2012). Expert performance indexing for sql server 2012.
[Books24x7 version] Available from
http://common.books24x7.com.libezproxy2.syr.edu/toc.aspx?bookid=54065.
While Statement. In Technopedia. Retrieved April 23, 2015, from
http://www.techopedia.com/definition/25648/while-statement-c
11