Sunteți pe pagina 1din 115

Table of Contents

Synopsis...........................................................................................................................3

Project Development Life Cycle (PDLC)..............................................................5

Objective & Scope.........................................................................................................6


Objective of SQL Tuner.............................................................................................6
Scope of SQL Tuner....................................................................................................6

Theoretical Background..............................................................................................9
What is performance tuning?....................................................................................9
Optimizing Database Performance........................................................................11
Designing Federated Database Servers...............................................................12
Database Design......................................................................................................13
Query Tuning..........................................................................................................15
Advanced Query Tuning Concepts.....................................................................18
Application Design.................................................................................................19
Optimizing Utility and Tool Performance..........................................................22
Optimizing Server Performance...........................................................................22
Indexes.........................................................................................................................23
Purpose and Structure............................................................................................23
Index Types..............................................................................................................24
Index Characteristics..............................................................................................25
SQL Server Performance Killers.............................................................................28
Poor Indexing..........................................................................................................28
Inaccurate Statistics................................................................................................29
Excessive Blocking and Deadlocks.......................................................................29
Poor Query Design.................................................................................................29
Poor Database Design............................................................................................30
Excessive Fragmentation.......................................................................................30
Non-reusable Execution Plans..............................................................................30
Frequent Recompilation of Execution Plans.......................................................31
Improper Use of Cursors.......................................................................................31
Improper Configuration of the Database Log....................................................31
Ineffective Connection Pooling.............................................................................31

Problem Definition....................................................................................................32

System Analysis & Design.......................................................................................33


Query Execution Process..........................................................................................33
Performance Tuning Process...................................................................................35

-1-
SQL tuner
Query Optimizer Architecture................................................................................39
Advanced Types of Optimization........................................................................42
Displaying Execution Plans.....................................................................................44
Execution Plan Basics.............................................................................................44
Graphics-Based Execution Plans..........................................................................44
Query Analyzer Graphical Estimated Execution Plans....................................46
Text-Based Execution Plans...................................................................................48
Estimated rows column in an execution plan......................................................54
Bookmark Lookups...................................................................................................58
SQL Server - Indexes and Performance.................................................................63
What happens over time?......................................................................................63
Defragmenting Indexes..........................................................................................67
How to Select Indexes for Your SQL Server Tables...........................................68
Analyzing a Database Execution Plan...................................................................77

System Planning..........................................................................................................85

Methodology.................................................................................................................86

System Implementation............................................................................................97
Prerequisites for system implementation.............................................................97
.NET Framework 2.0 Installation..........................................................................97
SQL Server 2000 Installation ................................................................................99
SQL Tuner Installation …………………………………………………………..104

Technical Specification...........................................................................................106
Hardware Requirements........................................................................................106
Software Requirements..........................................................................................106

User Manual................................................................................................................107

Future Enhancements..............................................................................................113
Optimizing more complex queries.......................................................................113
Optimizing Database Structure............................................................................113
Optimizing Queries Embedded in the Applications........................................114

Bibliography...............................................................................................................115
Websites.....................................................................................................................115
Books..........................................................................................................................115
Components Used....................................................................................................115

-2-
SQL tuner
SYNOPSIS
Project Name: SQL TUNER.

Project Members:

This project is done in a group of two people. Project members are.

1. Joydeep Dass
2. Sapana Rodrigues

Problem Statement:

The normal scenario in today’s industry is whenever a programmer or a


developer writes a new query; they have to submit the query to the DBA of the company
for tuning it. Even DBA was not able to tune the query to the fullest and even if it has
been tuned it would take lots of time and resources of the DBA’s. During the crunch
time it is not possible to tune each and every query. The DBA’s has to rely on his
experience to do the tuning

As there are no set rules for tuning it. Sometimes even the experienced DBA was not
able to tune the query.

Why SQL Tuner?

This topic was chosen to reduce the work of DBA’s of tuning the query. This tool
can be used even by the developer to tune the queries instead of going to the DBA for
tuning it.

Project Scope:

This project is developed for tuning SQL Queries. Tuning can be done by
reducing the total CPU time and also reducing the I/O taken by the Query.

Tuning is done in two ways:

 Syntax Tuning :

Checking the logical and physical operators used by the query.

 Index Tuning :

Checking the indexes used in the query (if any) and which indexes can be
applied to the columns used in the query.

-3-
SQL tuner
Methodology:

User types its queries in the interface provided by the software. The user has two
choices either of tuning or executing the query. If tuning is been selected the software
just gives the suggestions for improving the performance of the query. If executing is
been selected the software just execute the query. Other than this it provides almost all
the facilities provided by the Query Analyzer of MS SQL.

Software Requirements:

 .NET Framework 2.0


 SQL Server 2000

Hardware Requirements:

Processor: Preferably 1.0 GHz or Greater.

RAM : 128 MB or Greater.

Limitations of the Software:

This project was made to understand how the SQL Server parses and tunes the
query internally. So, we have just been able to tune simple queries.

Future Enhancements:

 To tune more complex and bigger queries.


 To study the database structure and provide the user with suggestions to
improve the database structure for best performance.

-4-
SQL tuner
Project Development Life Cycle (PDLC)

-5-
SQL tuner
Objective & Scope
Objective of SQL Tuner

Performance tuning is an important part of today’s database applications. Very often,


large savings in both time and money can be achieved with proper performance tuning.
The beauty of performance tuning is that, in many cases, a small change to an index or a
SQL query can result in a far more efficient application.

Query optimization is an important skill for SQL developers and database


administrators (DBAs). In order to improve the performance of SQL queries, developers
and DBAs need to understand the query optimizer and the techniques it uses to select an
access path and prepare a query execution plan. Query tuning involves knowledge of
techniques such as cost-based and heuristic-based optimizers, plus the tools an SQL
platform provides for explaining a query execution plan.

Main Objective of SQL Tuner is to analyze the query provided by the user and suggest
them the ways by which they can optimize the query for performance.

SQL Tuner is a tool principally made for DBA’s and to minimize their work load,
developers can also use them to prepare the optimized

Scope of SQL Tuner

The main objective of this project is to tune the SQL queries provided by the user.
Tuning means a way in which the queries entered by the user can be made efficient; this
can be done by reducing the Total CPU time taken by the query, and also by reducing the
Input/Output taken by the computer to compute the query provided by the user.

Both Total CPU Time and Input/Output for the query can be reduced by tuning
the user queries in two ways:

 Syntax Tuning:

Syntax Tuning can be done by checking the Logical Operators, Arithmetic


Operators and also Relational Operators of the queries provided by the
user.

Logical Operators:
These are the operators which are used to combine two or more
clauses present in the where clause of the query given by the user. The
logical operators are AND, OR, NOT.

-6-
SQL tuner
Out of the following three operators NOT hampers the query performance
a lot then the other two logical operators. So it is preferred not to use the
NOT operator, this can be done in by changing the where clause of the
query.

Example: So if want to know how many students in a college are not


boys.

Solution: This query can be written in two ways:

Select * from students where gender not ‘BOYS’


OR
Select * from students where gender = ‘GIRLS’

The first solution searches for records which are not boys and the second
solution searches for records which are girls.
To user, both the solution means one and the same thing but, to the
SQL optimizer the first solution is that it has to first find all the records
which are boys and store it in a temporary table and then search for
records which doesn’t matches with the records present in the above
mention temporary table.
But in the second solution the SQL optimizer has to search for
records which are girls so it doesn’t have to create any temporary tables
and neither has to compare the records. So, it reduces the query execution
drastically and even the Input\Output has been reduced in the second one
as it doesn’t have to go again and again to fetch the records.
But in some cases it is not possible to write a query without a NOT, in
which case nothing can be done syntactically to optimize the user query.

Arithmetic Operators:
These are the operators by which user can obtain some calculated
values. The basic arithmetic operators are Addition (+), Subtraction (-),
Division (/), Multiplication (*) and Exponentiation (^).
Care should be taken by the user while writing the query such that none of
these operators are present on the left hand side of the relational operators
present in the where clause.

Example: Find all the salesperson whose sale is just short by 2000.

Solution: This query can also be written in many ways, out of which the
two ways are as follows

Select salesperson from Sales where sale – 2000 = sale


OR
Select salesperson from Sales where sale = sale – 2000

-7-
SQL tuner
Here also both queries seems to be the same for the user but for the SQL
optimizer the second query works faster than the first one as it doesn’t
have to find the values of all the salesperson and then subtract 2000 from
to it to calculate the final value.
For Arithmetic operators try to avoid using the operators on the left hand
side of the relational operators.

Relational Operators:
These operators are use to give a relation between the column
name and its expression to be found in the where clause. The operators are
listed in decreasing order of their performance.
Highest performance is given by Equal (=) then comes Greater than (>)
and Less than (<), then comes Greater than or equal to (>=) and Less than
or equal to (<=), then comes Like operator and then the least performance
is given by Not equal to (<>).

 Index Tuning:

Indexing is one of the most crucial elements in increasing the


performance of SQL Server. A well-written query will not show its
effectiveness unless powered by an appropriate index or indexes on the
table(s) used in a query, especially if the tables are large.

Indexes exist to make data selection faster, so the focus of this article is on
ways you can select the best indexes for faster data retrieval. This is done
in a two-step process.

 Step One: Gathering Information


 Step Two: Taking Actions on the Information Gathered

Indexing can be quite a challenging task if you are not familiar with your
databases, the relationships between tables, and how queries are written
in stored procedures and embedded SQL. How many companies you
have worked for have a proper ERD diagram of their databases and have
followed the textbook method style of programming? In the real world,
time is often limited, resulting in poor SQL Server database performance.

If you have been tasked with optimizing a database's performance (at


least to a respectable level), or you want to be proactive with your
databases to prevent potential future performance issues, following these
steps should help you in tuning tables, just as they have helped me. These
steps are applicable at any stage of project, even if a deadline is just
around the corner.

-8-
SQL tuner
Theoretical Background
What is performance tuning?

What is the goal of tuning a SQL Server database? The goal is to improve performance
until acceptable levels are reached. Acceptable levels can be defined in a number of
ways. For a large online transaction processing (OLTP) application the performance goal
might be to provide sub second response time for critical transactions and to provide a
response time of less than two seconds for 95 percent of the other main transactions. For
some systems, typically batch systems, acceptable performance might be measured in
throughput. For example, a settlement system may define acceptable performance in
terms of the number of trades settled per hour. For an overnight batch suite acceptable
performance might be that it must finish before the business day starts.

Whatever the system, designing for performance should start early in the design process
and continue after the application has gone live. Performance tuning is not a one-off
process but an iterative process during which response time is measured, tuning
performed, and response time measured again.

There is no right way to design a database; there are a number of possible approaches
and all these may be perfectly valid. It is sometimes said that performance tuning is an
art, not a science. This may be true, but it is important to undertake performance tuning
experiments with the same kind of rigorous, controlled conditions under which
scientific experiments are performed. Measurements should be taken before and after
any modification, and these should be made one at a time so it can be established which
modification, if any, resulted in an improvement or degradation.

What areas should the database designer concentrate on? The simple answer to this
question is that the database designer should concentrate on those areas that will return
the most benefit. In my experience, for most database designs I have worked with, large
gains are typically made in the area of query and index design. Inappropriate indexes
and badly written queries, as well as some other contributing factors, can negatively
influence the query optimizer such that it chooses an inefficient strategy.

To give you some idea of the gains to be made in this area I once was asked to look at a
query that joined a number of large tables together. The query was abandoned after it
had not completed within 12 hours. The addition of an index in conjunction with a
modification to the query meant the query now completed in less than eight minutes!
This magnitude of gain cannot be achieved just by purchasing more hardware or by
twiddling with some arcane SQL Server configuration option. A database designer or
administrator's time is always limited, so make the best use of it! The other main area
where gains can be dramatic is lock contention. Removing lock bottlenecks in a system
with a large number of users can have a huge impact on response times.

-9-
SQL tuner
Now, some words of caution when chasing performance problems. If users phone up to
tell you that they are getting poor response times, do not immediately jump to
conclusions about what is causing the problem. Circle at a high altitude first. Having
made sure that you are about to monitor the correct server use the System Monitor to
look at the CPU, disk subsystem, and memory use. Are there any obvious bottlenecks? If
there are, then look for the culprit. Everyone blames the database, but it could just as
easily be someone running his or her favorite game! If there are no obvious bottlenecks,
and the CPU, disk, and memory counters in the System Monitor are lower than usual,
then that might tell you something. Perhaps the network is sluggish or there is lock
contention. Also be aware of the fact that some bottlenecks hide others. A memory
bottleneck often manifests itself as a disk bottleneck.

- 10 -
SQL tuner
Optimizing Database Performance

The goal of performance tuning is to minimize the response time for each query and to
maximize the throughput of the entire database server by reducing network traffic, disk
I/O, and CPU time. This goal is achieved through understanding application
requirements, the logical and physical structure of the data, and tradeoffs between
conflicting uses of the database, such as online transaction processing (OLTP) versus
decision support.

Performance issues should be considered throughout the development cycle, not at the
end when the system is implemented. Many performance issues that result in significant
improvements are achieved by careful design from the outset. To most effectively
optimize the performance of Microsoft® SQL Server™ 2000, you must identify the areas
that will yield the largest performance increases over the widest variety of situations and
focus analysis on those areas.

Although other system-level performance issues, such as memory, hardware, and so on,
are certainly candidates for study, experience shows that the performance gain from these
areas is often incremental. Generally, SQL Server automatically manages available
hardware resources, reducing the need (and thus, the benefit) for extensive system-level
manual tuning.

Topic Description
Designing Federated Describes how to achieve high levels of performance, such as those
Database Servers required by large Web sites, by balancing the processing load
across multiple servers.
Database Design Describes how database design is the most effective way to
improve overall performance. Database design includes the logical
database schema (such as tables and constraints) and the physical
attributes such as disk systems, object placement, and indexes.
Query_Tuning Describes how the correct design of the queries used by an
application can significantly improve performance.
Application Design Describes how the correct design of the user application can
significantly improve performance. Application design includes
transaction boundaries, locking, and the use of batches.
Optimizing Utility and Describes how some of the options available with the utilities and
Tool Performance tools supplied with Microsoft SQL Server 2000 can highlight ways
in which the performance of these tools can be improved, as well
as the effect of running these tools and your application at the
same time.
Optimizing Server Describes how settings in the operating system (Microsoft
Performance Windows NT®, Microsoft Windows® 95, Microsoft Windows 98
or Microsoft Windows 2000) and SQL Server can be changed to

- 11 -
SQL tuner
improve overall performance.

Designing Federated Database Servers

To achieve the high levels of performance required by the largest Web sites, a multitier
system typically balances the processing load for each tier across multiple servers.
Microsoft® SQL Server™ 2000 shares the database processing load across a group of
servers by horizontally partitioning the SQL Server data. These servers are managed
independently, but cooperate to process the database requests from the applications;
such a cooperative group of servers is called a federation.

A federated database tier can achieve extremely high levels of performance only if the
application sends each SQL statement to the member server that has most of the data
required by the statement. This is called collocating the SQL statement with the data
required by the statement. Collocating SQL statements with the required data is not a
requirement unique to federated servers. It is also required in clustered systems.

Although a federation of servers presents the same image to the applications as a single
database server, there are internal differences in how the database services tier is
implemented.

Single server tier Federated server tier


There is one instance of SQL Server on the There is one instance of SQL Server on each
production server. member server.
The production data is stored in one Each member server has a member database.
database. The data is spread through the member
databases.
Each table is typically a single entity. The tables from the original database are
horizontally partitioned into member tables.
There is one member table per member
database, and distributed partitioned views
are used to make it appear as if there was a full
copy of the original table on each member
server.
All connections are made to the single The application layer must be able to collocate
server, and all SQL statements are SQL statements on the member server
processed by the same instance of SQL containing most of the data referenced by the
Server. statement.

While the goal is to design a federation of database servers to handle a complete


workload, you do this by designing a set of distributed partitioned views that spread the
data across the different servers.

- 12 -
SQL tuner
Database Design

There are two components to designing a database: logical and physical. Logical
database design involves modeling your business requirements and data using database
components, such as tables and constraints, without regard for how or where the data
will be physically stored. Physical database design involves mapping the logical design
onto physical media, taking advantage of the hardware and software features available,
which allows the data to be physically accessed and maintained as quickly as possible,
and indexing.
It is important to correctly design the database to model your business requirements,
and to take advantage of hardware and software features early in the development cycle
of a database application, because it is difficult to make changes to these components
later.
Logical Database Design
Using Microsoft® SQL Server™ 2000 effectively begins with normalized database
design. Normalization is the process of removing redundancies from the data. For
example, when you convert from an indexed sequence access method (ISAM) style
application, normalization often involves breaking data in a single file into two or more
logical tables in a relational database. Transact-SQL queries then recombine the table
data by using relational join operations. By avoiding the need to update the same data in
multiple places, normalization improves the efficiency of an application and reduces the
opportunities for introducing errors due to inconsistent data.
However, there are tradeoffs to normalization. A database that is used primarily for
decision support (as opposed to update-intensive transaction processing) may not have
redundant updates and may be more understandable and efficient for queries if the
design is not fully normalized. Nevertheless, data that is not normalized is a more
common design problem in database applications than over-normalized data. Starting
with a normalized design, and then selectively denormalizing tables for specific reasons,
is a good strategy.
Whatever the database design, you should take advantage of these features in SQL
Server to automatically maintain the integrity of your data:

 CHECK constraints ensure that column values are valid.

 DEFAULT and NOT NULL constraints avoid the complexities (and


opportunities for hidden application bugs) caused by missing column values.
 PRIMARY KEY and UNIQUE constraints enforce the uniqueness of rows (and
implicitly create an index to do so).
 FOREIGN KEY constraints ensure that rows in dependent tables always have a
matching master record.
 IDENTITY columns efficiently generate unique row identifiers.
 Timestamp columns ensure efficient concurrency checking between multiple-
user updates.
 User-defined data types ensure consistency of column definitions across the
database.

- 13 -
SQL tuner
By taking advantage of these features, you can make the data rules visible to all users of
the database, rather than hiding them in application logic. These server-enforced rules
help avoid errors in the data that can arise from incomplete enforcement of integrity
rules by the application itself. Using these facilities also ensures that data integrity is
enforced as efficiently as possible.
Physical Database Design
The I/O subsystem (storage engine) is a key component of any relational database. A
successful database implementation usually requires careful planning at the early stages
of your project. The storage engine of a relational database requires much of this
planning, which includes determining:

 What type of disk hardware to use, such as RAID (redundant array of


independent disks) devices.

 How to place your data onto the disks.


 Which index design to use to improve query performance in accessing data?
 How to set all configuration parameters appropriately for the database to
perform well.

- 14 -
SQL tuner
Query Tuning

It may be tempting to address a performance problem solely by system-level server


performance tuning; for example, memory size, type of file system, number and type of
processors, and so forth. Experience has shown that most performance problems cannot
be resolved this way. They must be addressed by analyzing the application, queries, and
updates that the application is submitting to the database, and how these queries and
updates interact with the database schema.
Unexpected long-lasting queries and updates can be caused by:

 Slow network communication.


 Inadequate memory in the server computer or not enough memory available for
Microsoft® SQL Server™ 2000.
 Lack of useful statistics.
 Out-of-date statistics.
 Lack of useful indexes.
 Lack of useful data striping.

When a query or update takes longer than expected, use the following checklist to
improve performance.

1. Is the performance problem related to a component other than queries? For


example, is the problem slow network performance? Are there any other
components that might be causing or contributing to performance degradation?
Windows NT Performance Monitor can be used to monitor the performance of
SQL Server and non-SQL Server related components.

2. If the performance issue is related to queries, which query or set of queries is


involved? Use SQL Profiler to help identify the slow query or queries.

The performance of a database query can be determined by using the SET


statement to enable the SHOWPLAN, STATISTICS IO, STATISTICS TIME, and
STATISTICS PROFILE options.

 SHOWPLAN describes the method chosen by the SQL Server query


optimizer to retrieve data. For more information, see SET
SHOWPLAN_ALL.

 STATISTICS IO reports information about the number of scans, logical


reads (pages accessed in cache), and physical reads (number of times the
disk was accessed) for each table referenced in the statement. For more
information, see SET STATISTICS IO.
 STATISTICS TIME displays the amount of time (in milliseconds) required
to parse, compile, and execute a query.
 STATISTICS PROFILE displays a result set after each executed query
representing a profile of the execution of the query.

- 15 -
SQL tuner
In SQL Query Analyzer, you can also turn on the graphical execution plan
option to view a graphical representation of how SQL Server retrieves data.
The information gathered by these tools allows you to determine how a query is
executed by the SQL Server query optimizer and which indexes are being used.
Using this information, you can determine if performance improvements can be
made by rewriting the query, changing the indexes on the tables, or perhaps
modifying the database design.
3. Was the query optimized with useful statistics?

Statistics on the distribution of values in a column are automatically created on


indexed columns by SQL Server. They can also be created on nonindexed
columns either manually, using SQL Query Analyzer or the CREATE
STATISTICS statement, or automatically, if the auto create statistics database
option is set to true. These statistics can be used by the query processor to
determine the optimal strategy for evaluating a query. Maintaining additional
statistics on nonindexed columns involved in join operations can improve query
performance. For more information, see Statistical Information.
Monitor the query using SQL Profiler or the graphical execution plan in SQL
Query Analyzer to determine if the query has enough statistics.
4. Are the query statistics up-to-date? Are the statistics automatically updated?

SQL Server automatically creates and updates query statistics on indexed


columns (as long as automatic query statistic updating is not disabled).
Additionally, statistics can be updated on nonindexed columns either manually,
using SQL Query Analyzer or the UPDATE STATISTICS statement, or
automatically, if the auto update statistics database option is set to true. Up-to-
date statistics are not dependent upon date or time data. If no UPDATE
operations have taken place, then the query statistics are still up-to-date.
If statistics are not set to update automatically, then set them to do so.

5. Are suitable indexes available? Would adding one or more indexes improve
query performance?

6. Are there any data or index hot spots? Consider using disk striping.
7. Is the query optimizer provided with the best opportunity to optimize a complex
query?

- 16 -
SQL tuner
Analyzing a Query
Microsoft SQL Server 2000 offers these ways to present information on how it navigates
tables and uses indexes to access the data for a query:
 Graphically display the execution plan using SQL Query Analyzer

In SQL Query Analyzer, click Query and select Display Execution Plan. After
executing a query, you can select the Execution Plan tab to see a graphical
representation of execution plan output.
 SET SHOWPLAN_TEXT ON

After this statement is executed, SQL Server returns the execution plan
information for each query.
 SET SHOWPLAN_ALL ON

This statement is similar to SET SHOWPLAN_TEXT, except that the output is in


a concise format.
When you display the execution plan, the statements you submit to the server are not
executed; instead, SQL Server analyzes the query and displays how the statements
would have been executed as a series of operators.
The best execution plan used by the query engine for individual data manipulation
language (DML) and Transact-SQL statements is displayed, and reveals compile-time
information about stored procedures; triggers invoked by a batch, and called stored
procedures and triggers invoked to an arbitrary number of calling levels. For example,
executing a SELECT statement can show that SQL Server uses a table scan to obtain the
data. Alternatively, an index scan may have been used instead if the index was
determined to be a faster method of retrieving the data from the table.
The results returned by the SHOWPLAN_TEXT and SHOWPLAN_ALL statements are a
tabular representation (rows and columns) of a tree structure. The execution plan tree
structure uses one row in the result set for each node in the tree, each node representing
a logical or physical operator used to manipulate the data to produce expected results.
SQL Query Analyzer instead graphically displays each logical and physical operator as
an icon.

- 17 -
SQL tuner
Advanced Query Tuning Concepts

Microsoft SQL Server2000 performs sort, intersect, union, and difference operations
using in-memory sorting and hash join technology. Using this type of query plan, SQL
Server supports vertical table partitioning, sometimes called columnar storage.
SQL Server employs three types of join operations:

 Nested loops joins

 Merge joins
 Hash joins

If one join input is quite small (such as fewer than 10 rows) and the other join input is
fairly large and indexed on its join columns, index nested loops are the fastest join
operation because they require the least I/O and the fewest comparisons.
If the two join inputs are not small but are sorted on their join column (for example, if
they were obtained by scanning sorted indexes), merge join is the fastest join operation.
If both join inputs are large and the two inputs are of similar sizes, merge join with prior
sorting and hash join offer similar performance. However, hash join operations are often
much faster if the two input sizes differ significantly from each other.
Hash joins can process large, unsorted, nonindexed inputs efficiently. They are useful
for intermediate results in complex queries because:

 Intermediate results are not indexed (unless explicitly saved to disk and then
indexed) and often are not produced suitably sorted for the next operation in the
query plan.

 Query optimizers estimate only intermediate result sizes. Because estimates can
be an order of magnitude wrong in complex queries, algorithms to process
intermediate results not only must be efficient but also must degrade gracefully
if an intermediate result turns out to be much larger than anticipated.

The hash join allows reductions in the use of denormalization to occur. Denormalization
is typically used to achieve better performance by reducing join operations, in spite of
the dangers of redundancy, such as inconsistent updates. Hash joins reduce the need to
denormalize. Hash joins allow vertical partitioning (representing groups of columns
from a single table in separate files or indexes) to become a viable option for physical
database design.

- 18 -
SQL tuner
Application Design

Application design plays a pivotal role in determining the performance of a system


using Microsoft® SQL Server™ 2000. Consider the client the controlling entity rather
than the database server. The client determines the type of queries, when they are
submitted, and how the results are processed. This in turn has a major effect on the type
and duration of locks, amount of I/O, and processing (CPU) load on the server, and
hence on whether performance is generally good or bad.
For this reason, it is important to make the correct decisions during the application
design phase. However, even if a performance problem occurs using a turn-key
application, where changes to the client application seem impossible, this does not
change the fundamental factors that affect performance: The client plays a dominant role
and many performance problems cannot be resolved without making client changes. A
well-designed application allows SQL Server to support thousands of concurrent users.
Conversely, a poorly designed application prevents even the most powerful server
platform from handling more than a few users.
Guidelines for client-application design include:
 Eliminate excessive network traffic.

Network roundtrips between the client and SQL Server are usually the main
reason for poor performance in a database application, an even greater factor
than the amount of data transferred between server and client. Network
roundtrips describe the conversational traffic sent between the client application
and SQL Server for every batch and result set. By making use of stored
procedures, you can minimize network roundtrips. For example, if your
application takes different actions based on data values received from SQL
Server, make those decisions directly in the stored procedure whenever possible,
thus eliminating network traffic.
If a stored procedure has multiple statements, then by default SQL Server sends
a message to the client application at the completion of each statement and
details the number of rows affected for each statement. Most applications do not
need these messages. If you are confident that your applications do not need
them, you can disable these messages, which can improve performance on a slow
network. Use the SET NOCOUNT session setting to disable these messages for
the application.
 Use small result sets.

Retrieving needlessly large result sets (for example, thousands of rows) for
browsing on the client adds CPU and network I/O load, makes the application
less capable of remote use, and limits multi-user scalability. It is better to design
the application to prompt the user for sufficient input so queries are submitted
that generate modest result sets.
Application design techniques that facilitate this include exercising control over
wildcards when building queries, mandating certain input fields, not allowing

- 19 -
SQL tuner
ad hoc queries, and using the TOP, PERCENT, or SET ROWCOUNT Transact-
SQL statements to limit the number of rows returned by a query.
 Allow cancellation of a query in progress when the user needs to regain control
of the application.

An application should never force the user to restart the client computer to
cancel a query. Ignoring this can lead to irresolvable performance problems.
When a query is canceled by an application, for example, using the open
database connectivity (ODBC) sqlcancel function, proper care should be
exercised regarding transaction level. Canceling a query, for example, does not
commit or roll back a user-defined transaction. All locks acquired within the
transaction are retained after the query is canceled. Therefore, after canceling a
query, always either commit or roll back the transaction. The same issues apply
to DB-Library and other application programming interfaces (APIs) that can be
used to cancel queries.
 Always implement a query or lock time-out.

Do not allow queries to run indefinitely. Make the appropriate API call to set a
query time-out. For example, use the ODBC SQLSetStmtOption function.
 Do not use application development tools that do not allow explicit control over
the SQL statements sent to SQL Server.

Do not use a tool that transparently generates Transact-SQL statements based on


higher-level objects if it does not provide crucial features such as query
cancellation, query time-out, and complete transactional control. It is often not
possible to maintain good performance or to resolve a performance problem if
the application generates transparent SQL statements, because this does not
allow explicit control over transactional and locking issues, which are critical to
the performance picture.

 Do not intermix decision support and online transaction processing (OLTP)


queries.

 Do not use cursors more than necessary.

Cursors are a useful tool in relational databases; however, it is almost always


more expensive to use a cursor than to use a set-oriented SQL statement to
accomplish a task.
In set-oriented SQL statements, the client application tells the server to update
the set of records that meet specified criteria. The server figures out how to
accomplish the update as a single unit of work. When updating through a cursor,
the client application requires the server to maintain row locks or version
information for every row, just in case the client asks to update the row after it
has been fetched.
Also, using a cursor implies that the server is maintaining client state
information, such as the user's current rowset at the server, usually in temporary
storage. Maintaining this state for a large number of clients is an expensive use of

- 20 -
SQL tuner
server resources. A better strategy with a relational database is for the client
application to get in and out quickly, maintaining no client state at the server
between calls. Set-oriented SQL statements support this strategy.
However, if the query uses cursors, determine if the cursor query could be
written more efficiently either by using a more-efficient cursor type, such as fast
forward-only, or a single query.

 Keep transactions as short as possible.

 Use stored procedures.


 Use prepared execution to execute a parameterized SQL statement.
 Always process all results to completion.

Do not design an application or use an application that stops processing result


rows without canceling the query. Doing so will usually lead to blocking and
slow performance.

 Ensure that your application is designed to avoid deadlocks.

 Ensure that all the appropriate options for optimizing the performance of
distributed queries have been set.

- 21 -
SQL tuner
Optimizing Utility and Tool Performance

Three operations performed on a production database that can benefit from optimal
performance include:

 Backup and restore operations.

 Bulk copying data into a table.


 Performing database console command (DBCC) operations.

Generally, these operations do not need to be optimized. However, in situations where


performance is critical, techniques can be used to fine-tune performance.

Optimizing Server Performance

Microsoft SQL Server 2000 automatically tunes many of the server configuration
options, therefore requiring little, if any, tuning by a system administrator. Although
these configuration options can be modified by the system administrator, it is generally
recommended that these options be left at their default values, allowing SQL Server to
automatically tune itself based on run-time conditions.
However, if necessary, the following components can be configured to optimize server
performance:

 SQL Server Memory

 I/O subsystem
 Microsoft Windows NT options

- 22 -
SQL tuner
Indexes
Indexes are structured to facilitate the rapid return of result sets. The two types of
indexes that SQL Server supports are clustered and non-clustered indexes. Indexes are
applied to one or more columns in tables or views. The characteristics of an index affect
its use of system resources and its lookup performance. The Query Optimizer uses an
index if it will increase query performance.

Purpose and Structure

An index in SQL Server assists the database engine with locating records, just like an
index in a book helps you locate information quickly. Without indexes, a query causes
SQL Server to search all records in a table (table scan) in order to find matches. A
database index contains one or more column values from a table (called the index key)
and pointers to the corresponding table records. When you perform a query using the
index key, the Query Optimizer will likely use an index to locate the records that match
the query.

An index is structured by the SQL Server Index Manager as a balanced tree (or tree).

A B-tree is analogous to an upside-down tree with the root of the tree at the top, the leaf
levels at the bottom, and intermediate levels in between. Each object in the tree structure
is a group of sorted index keys called an index page. A B-tree facilitates fast and
consistent query performance by carefully balancing the width and depth of the tree as
the index grows. Sorting the index on the index key also improves query performance.
All search requests begin at the root of a B-tree and then move through the tree to the
appropriate leaf level. The number of table records and the size of the index key affect
the width and depth of the tree. Index key size is called the key width. A table that has
many records and a large index key width creates a deep and wide B-tree. The smaller
the tree, the more quickly a search result is returned.

For optimal query performance, create indexes on columns in a table that are commonly
used in queries. For example, users can query a Customers table based on
Last name or customer ID. Therefore, you should create two indexes for the table: a last-
name index and a customer ID index. To efficiently locate records, the Query Optimizer
uses an index that matches the query. The Query Optimizer will likely use the customer
ID index when the following query is executed: SELECT * FROM Customers WHERE
customerid = 798 Do not create indexes for every column in a table, because too many
indexes will negatively impact performance. The majority of databases are dynamic; that
is, records are added, deleted, and changed regularly. When a table containing an index
is modified, the index must be updated to reflect the modification. If index updates do
not occur, the index will quickly become ineffective. Therefore, insert, update, and delete
events trigger the Index Manager to update the table indexes. Like tables, indexes are
data structures that occupy space in the database. The larger the table, the larger the
index that is created to contain the table. Before creating an index, you must be sure that

- 23 -
SQL tuner
the increased query performance afforded by the index outweighs the additional
computer resources necessary to maintain the index.

Index Types

There are two types of indexes: clustered and nonclustered. Both types of indexes are
structured as B-trees. A clustered index contains table records in the leaf level of the B-
tree. A nonclustered index contains a bookmark to the table records in the leaf level. If a
clustered index exists on a table, a nonclustered index uses it to facilitate data lookup. In
most cases, you will create a clustered index on a table before you create nonclustered
indexes.
Clustered Indexes
There can be only one clustered index on a table or view, because the clustered index
key physically sorts the table or view. This type of index is particularly efficient for
queries, because data records—also known as data pages—are stored in the leaf level of
the B-tree. The sort order and storage location of a clustered index is analogous to a
dictionary in that the words in a dictionary are sorted alphabetically and definitions
appear next to the words. When you create a primary key constraint in a table that does
not contain a clustered index, SQL Server will use the primary key column for the
clustered index key. If a clustered index already exists in a table, a nonclustered index is
created on the column defined with a primary key constraint. A column defined as the
PRIMARY key is a useful index because the column values are guaranteed to be unique.
Unique values create smaller B-trees than redundant values and thus make more
efficient lookup structures.

To force the type of index to be created for a column or columns, you can specify the
CLUSTERED or NONCLUSTERED clause in the CREATE TABLE, ALTER TABLE, or
CREATE INDEX statements. Suppose that you create a Persons table containing the
following columns: PersonID, FirstName, LastName, and Social- SecurityNumber. The
PersonID column is defined as a primary key constraint, and the SocialSecurityNumber
column is defined as a unique constraint. To make the SocialSecurityNumber column a
clustered index and the PersonID column a nonclustered index, create the table by
using the following syntax:

CREATE TABLE dbo.Persons


(
personid smallint PRIMARY KEY NONCLUSTERED,
firstname varchar(30),
lastname varchar(40),
socialsecuritynumber char(11) UNIQUE CLUSTERED
)

Indexes are not limited to constraints. You create indexes on any column or
combination of columns in a table or view. Clustered indexes enforce uniqueness
internally. Therefore, if you create a nonunique, clustered index on a column that

- 24 -
SQL tuner
contains redundant values, SQL Server creates a unique value on the redundant
columns to serve as a secondary sort key. To avoid the additional work required to
maintain unique values on redundant rows, favor clustered indexes for columns defined
with primary key constraints.

Nonclustered Indexes

On a table or view, you can create 250 nonclustered indexes or 249 nonclustered indexes
and one clustered index. You must first create a unique clustered index on a view before
you can create nonclustered indexes. This restriction does not apply to tables, however.
A nonclustered index is analogous to an index in the back of a book. You can use a
book’s index to locate pages that match an index entry. The database uses a nonclustered
index to locate matching records in the database. If a clustered index does not exist on a
table, the table is unsorted and is called a heap. A nonclustered index created on a heap
contains pointers to table rows. Each entry in an index page contains a row ID (RID).
The RID is a pointer to a table row in a heap, and it consists of a page number, a file
number, and a slot number. If a clustered index exists on a table, the index pages of a
nonclustered index contain clustered index keys rather than RIDs. An index pointer,
whether it is a RID or an index key, is called a bookmark.

Index Characteristics

A number of characteristics (aside from the index type, which is clustered or


nonclustered) can be applied to an index. An index can be defined as follows:
 Unique duplicate records are not allowed
 A composite of columns—an index key made up of multiple columns
 With a fill factor to allow index pages to grow when necessary
 With a pad index to change the space allocated to intermediate levels of the B-
tree
 With a sort order to specify ascending or descending index keys
Unique

When an index is defined as unique, the index keys and the corresponding column
values must be unique. A unique index can be applied to any column if all column
values are unique. A unique index can also be applied to a group of columns (a
composite of columns). The composite column unique index must maintain
distinctiveness. For example, a unique index defined on a lastname column and a social
security number column must not contain NULL values in both columns. Furthermore,
if there are values in both columns, the combination of lastname and social security
number must be unique.

SQL Server automatically creates a unique index for a column or columns defined with a
primary key or unique constraint. Therefore, use constraints to enforce data
distinctiveness, rather than directly applying the unique index characteristic. SQL Server

- 25 -
SQL tuner
will not allow you to create an index with the uniqueness property on a column
containing duplicate values.

Composite

A composite index is any index that uses more than one column in a table for its index
key. Composite indexes can improve query performance by reducing input/ output
(I/O) operations, because a query on a combination of columns contained in the index
will be located entirely in the index. When the result of a query is obtained from the
index without having to rely on the underlying table, the query is considered covered—
and the index is considered covering. A single column query, such as a query on a
column with a primary key constraint, is covered by the index that is automatically
created on that column. A covered query on multiple columns uses a composite index as
the covering index. Suppose that you run the following query:

SELECT emp_id, lname, job_lvl


FROM employee01
WHERE hire_date < (GETDATE() - 30)
AND job_lvl >= 100
ORDER BY job_lvl

If a clustered index exists on the Emp_ID column and a nonclustered index named INco
exists on the LName, Job_Lvl, and Hire_Date columns, then INco is a covering index.
Remember that the bookmark of a nonclustered index created on a table containing a
clustered index is the clustered index key. Therefore, the INco index contains all
columns specified in the query (the index is covering, and the query is covered). Figure
11.1 shows that the Query Optimizer uses INco in the query execution plan.

- 26 -
SQL tuner
Fill Factor and Pad Index

When a row is inserted into a table, SQL Server must locate some space for it. An insert
operation occurs when the INSERT statement is used or when the UPDATE statement is
used to update a clustered index key. If the table doesn’t contain a clustered index, the
record and the index page are placed in any available space within the heap. If the table
contains a clustered index, SQL Server locates the appropriate index page in the B-tree
and then inserts the record in sorted order. If the index page is full, it is split (half of the
pages remain in the original index page, and half of the pages move to the new index
page). If the inserted row is large, additional page splits might be necessary. Page splits
are complex and are resource intensive. The most common page split occurs in the leaf
level index pages. To reduce the occurrence of page splits, specify how full the index
page should be when it is created. This value is called the fill factor. By default, the fill
factor is zero, meaning that the index page is full when it is created on existing data. A
fill factor of zero is synonymous with a fill factor of 100. You can specify a global default
fill factor for the server by using the sp_configure stored procedure or for a specific index
with the FILLFACTOR clause. In high-capacity transaction systems, you might also
want to allocate additional space to the intermediate level index pages. The additional
space assigned to the intermediate levels is called the pad index.

Sort Order

When you create an index, it is sorted in ascending order. Both clustered and
nonclustered indexes are sorted; the clustered index represents the sort order of the
table. Consider the following SELECT statement:

SELECT emp_id, lname, job_lvl FROM employee01


WHERE hire_date < (GETDATE() - 30) AND job_lvl >= 100

Notice that there is no sort order specified. The composite index is nonclustered, and the
first column in the index is lname. No sort order was specified when the index was
created; therefore, the result is sorted in ascending order starting with the lname
column. The ORDER BY clause is not specified, thus saving computing resources. But
the result appears sorted first by lname. The sort order is dependent on the index used
to return the result (unless you specify the ORDER BY clause or you tell the SELECT
statement which index to use). If the Query Optimizer uses a clustered index to return a
result, the result appears in the sort order of the clustered index, which is equivalent to
the data pages in the table. The following Transact-SQL statement uses the clustered
index on the Emp_ID column to return a result in ascending order:

SELECT emp_id, lname, fname FROM employee01

- 27 -
SQL tuner
SQL Server Performance Killers

Let’s now consider the major problem areas that can degrade SQL Server
performance. By being aware of the main performance killers in SQL Server in advance,
you will be able to focus your tuning efforts on the likely causes.

Once you have optimized the hardware, operating system, and SQL Server settings, the
main performance killers in SQL Server are as follows, in a rough order (with the worst
appearing first):

• Poor indexing

• Inaccurate statistics

• Excessive blocking and deadlocks

• Poor query design

• Poor database design

• Excessive fragmentation

• No reusable execution plans

• Frequent recompilation of execution plans

• Improper use of cursors

• Improper configuration of the database log

• Ineffective connection pooling

Let’s take a quick look at each of these

Poor Indexing

Poor indexing is usually one of the biggest performance killers in SQL Server. In
the absence of proper indexing for a query, SQL Server has to retrieve and process much
more data while executing the query. This causes high amounts of stress on the disk,
memory, and CPU, increasing the query execution time significantly. Increased query
execution time then leads to excessive blocking and deadlocks in SQL Server.

- 28 -
SQL tuner
Generally, indexes are considered to be the responsibility of the database
administrator (DBA). However, the DBA cannot define how to use the indexes, since the
use of indexes is determined by the database queries and stored procedures written by
the developers. Therefore, defining the indexes should be the responsibility of the
developers. Indexes created without the knowledge of the queries serve little purpose.

Inaccurate Statistics

As SQL Server relies heavily on cost-based optimization, accurate data-


distribution statistics are extremely important for the effective use of indexes. Without
accurate statistics, SQL Server’s built-in query optimizer cannot accurately estimate the
number of rows affected by a query. As the amount of data to be retrieved from a table
is highly important in deciding how to optimize the query execution, the query
optimizer is much less effective if the data distribution statistics are not maintained
accurately.

Excessive Blocking and Deadlocks

Because SQL Server is fully Atomicity, Consistency, Isolation, Durability (ACID)


compliant, the database engine ensures that modifications made by concurrent
transactions are properly isolated from one another. By default, a transaction sees the
data either in the state before another concurrent transaction modified the data or after
the other transaction completed— it does not see an intermediate state.

Because of this isolation, when multiple transactions try to access a common resource
concurrently in a noncompatible way, blocking occurs in the database. A deadlock, which
is an outcome of blocking, aborts the victimized database request that faced the
deadlock. This requires that the database request be resubmitted for successful
execution. The execution time of a query is adversely affected by the amount of blocking
and deadlock it faces. For scalable performance of a multi-user database application,
properly controlling the isolation levels and transaction scopes of the queries to
minimize blocking and deadlock is critical; otherwise, the execution time of the queries
will increase significantly, even though the hardware resources may be highly
underutilized.

Poor Query Design

The effectiveness of indexes depends entirely on the way you write SQL queries.
Retrieving excessively large numbers of rows from a table, or specifying a filter criterion
that returns a larger result set from a table than is required, renders the indexes
ineffective. To improve performance, you must ensure that the SQL queries are written
to make the best use of new or existing indexes. Failing to write cost-effective SQL
queries may prevent SQL Server from choosing proper indexes, which increases query
execution time and database blocking.

- 29 -
SQL tuner
Query design covers not only single queries, but also sets of queries often used to
implement database functionalities such as a queue management among queue readers
and writers. Even when the performance of individual queries used in the design is fine,
the overall performance of the database can be very poor. Resolving this kind of
bottleneck requires a broad understanding of different characteristics of SQL Server,
which can affect the performance of database functionalities.

Poor Database Design

A database should be adequately normalized to increase the performance of data


retrieval and reduce blocking. For example, if you have an under normalized database
with customer and order information in the same table, then the customer information
will be repeated in all the order rows of the customer. This repetition of information in
every row will increase the I/Os required to fetch all the orders placed by a customer. At
the same time, a data writer working on a customer’s order will reserve all the rows that
include the customer information and thus block all other data writers/data readers
trying to access the customer profile.

Over normalization of a database is as bad as under normalization. Over


normalization increases the number and complexity of joins required to retrieve data.
An over normalized database contains a large number of tables with a very small
number of columns. As a very general rule of thumb, you may continue the
normalization process unless it causes lots of queries to have four-way or greater joins.
Having too many joins in a query may also be due to the fact that database entities have
not been partitioned very distinctly or the query is serving a very complex set of
requirements that could perhaps be better served by creating a new view or stored
procedure.

Excessive Fragmentation

While analyzing data retrieval operations, you can usually assume that the data
is organized in an orderly way, as indicated by the index used by the data retrieval
operation. However, if the pages containing the data are fragmented in a non-orderly
fashion, or if they contain a small amount of data due to frequent page splits, then the
number of read operations required by the data retrieval operation will be much higher
than might otherwise be required. The increase in the number of read operations caused
by fragmentation hurts query performance.

Non-reusable Execution Plans

To execute a query in an efficient way, SQL Server’s query optimizer spends a


fair amount of CPU cycles creating a cost-effective execution plan. The good news is
that the plan is cached in memory, so you can reuse it once created. However, if the plan
is designed so that you cannot plug variable values into it, SQL Server creates a new
execution plan every time the same query is resubmitted with different variable values.

- 30 -
SQL tuner
So, for better performance, it is extremely important to submit SQL queries in forms that
help SQL Server cache and reuse the execution plans.

Frequent Recompilation of Execution Plans

One of the standard ways of ensuring a reusable execution plan, independent of


variable values used in a query, is to use a stored procedure. Using a stored procedure
to execute a set of SQL queries allows SQL Server to create a parameterized execution plan.

A parameterized execution plan is independent of the parameter values supplied during


the execution of the stored procedure, and it is consequently highly reusable. However,
the execution plan of the stored procedure can be reused only if SQL Server does not
have to recompile the execution plan every time the stored procedure is run. Frequent
recompilation of a stored procedure increases pressure on the CPU and the query
execution time.

Improper Use of Cursors

By preferring a cursor-based (row-at-a-time) result set instead of a regular set-


based SQL query, you add a fair amount of overhead on SQL Server. Use set-based
queries whenever possible, but if you are forced to use cursors, be sure to use efficient
cursor types such as fast forward–only. Excessive use of inefficient cursors increases
stress on SQL Server resources, slowing down system performance.

Improper Configuration of the Database Log

By failing to follow the general recommendations in configuring a database log,


you can adversely affect the performance of an Online Transaction Processing (OLTP)–
based SQL Server database. For optimal performance, SQL Server heavily relies on
accessing the database logs effectively.

Ineffective Connection Pooling

If you don’t use connection pooling, or if you don’t have enough connections in your
pool, then each database connection request goes across the network to establish a
database connection. This network latency increases the query execution time. Poor use
of the connection pool will also increase the amount of memory used in SQL Server,
since a large number of connections require a large amount of memory on the SQL
Server.

- 31 -
SQL tuner
Problem Definition
The normal scenario in today’s world is that the user or the programmer who is using
the query for working with the database often faces the problems of slow execution of
the query. One reason for this problem could be that many users are trying to access the
SQL Server at the same time. But user can reduce the time taken by the SQL Server for
the execution of the query by optimizing the query.

Now the problem in optimizing is that user or programmer is not well versed with SQL
Server Optimizer and the way in which it executes its query. So what the user or the
programmer does is that it submits it query to the DBA of the company or of an
organization. The DBA has to go through the query and change the query or the table in
such a way that the query performs in an optimum way. Even DBA were not able to
tune to the query of the user very efficiently and even if it does the tuning efficiently it
would take lots of time and resources of the DBA’s. This wouldn’t be feasible at the time
when queries are urgently needed by the user or the programmer.

So one way was to get a tool in which the query would get optimized and that also in
less time and also without the help of the DBA. Even one more problem that the DBA
was faced was that he has to rely on his own skills and experience to optimize the query
as there are no set rules for doing the optimization. This would also cause a problem as
there in no reference against which the DBA can be assured that the query is been
optimized properly.

- 32 -
SQL tuner
System Analysis & Design
Query Execution Process
The path that a query traverses through a DBMS until its answer is generated is shown
in Figure 1. The system modules through which it moves have the following
functionality:
 The Query Parser checks the validity of the query and then translates it into an
internal form, usually a relational calculus expression or something equivalent.
 The Query Optimizer examines all algebraic expressions that are equivalent to
the given query and chooses the one that is estimated to be the cheapest.
 The Code Generator or the Interpreter transforms the access plan generated by
the optimizer into calls to the query processor.
 The Query Processor actually executes the query.

Queries are posed to a DBMS by interactive users or by programs written in general-


purpose programming languages (e.g., C/C++, Fortran, PL-1) that have queries
embedded in them. An interactive (ad hoc) query goes through the entire path shown in
Figure 1. On the other hand, an embedded query goes through the first three steps only
once, when the program in which it is em-bedded is compiled (compile time).

- 33 -
SQL tuner
The code produced by the Code Generator is stored in the database and is simply
invoked and executed by the Query Processor whenever control reaches that query
during the program execution (run time). Thus, independent of the number of times an
embedded query needs to be executed, optimization is not repeated until database
updates make the access plan invalid (e.g., index deletion) or highly suboptimal (e.g.,
extensive changes in database contents). There is no real difference between optimizing
interactive or embedded queries.

The area of query optimization is very large within the database field. The purpose is to
primarily discuss the core problems in query optimization and their solutions, and only
touch upon the wealth of results that exist beyond that. More specifically, we
concentrate on optimizing a single at SQL query with ‘and’ as the only boolean
connective in its qualification (also known as conjunctive query, select-project-join
query, or non-recursive Horn clause) in a centralized relational DBMS, assuming that
full knowledge of the run-time environment exists at compile time.

- 34 -
SQL tuner
Performance Tuning Process

The performance tuning process consists of identifying performance bottlenecks,


troubleshooting their cause, applying different resolutions, and then quantifying
performance improvements. It is necessary to be a little creative, since most of the time
there is no one silver bullet to improve performance. The challenge is to narrow down
the list of possible reasons and evaluate the effects of different resolutions. You may
even undo modifications as you iterate through the tuning process. During the tuning
process, you must examine various hardware and software factors that can affect the
performance of a SQL Server–based application. A few of the general questions you
should be asking yourself during the performance analysis are as follows:

 Is any other resource-intensive application running on the same server?


 Is the hardware subsystem capable of withstanding the maximum workload?
 Is SQL Server configured properly?
 Is the database connection between SQL Server and the database application
efficient?
 Does the database design support the fastest data retrieval (and modification for
an updateable database)?
 Is the user workload, consisting of SQL queries, optimized to reduce the load on
SQL Server?
 Does the workload support the maximum concurrency?

If any of these factors is not configured properly, then the overall system performance
may suffer. Performance tuning is an iterative process, where you identify major
bottlenecks, attempt to resolve them, measure the impact of your changes, and return to
the first step until performance is acceptable. While applying your solutions, you should
follow the golden rule of making only one change at a time. Any change usually affects
other parts of the system, so you must re-evaluate the effect of each change on the
performance of the overall system. As an example, the addition of an index may fix the
performance of a specific query, but it could cause other queries to run more slowly. In
such a case, evaluating one change at a time also helps in prioritizing the
implementation order of the changes on the production server, based on their relative
contributions. You can keep on chipping away at performance bottlenecks and
improving the system performance gradually. Initially, you will be able to resolve big
performance bottlenecks and achieve significant performance improvements, but as you
proceed through the iterations, your returns will gradually diminish. Therefore, to use
your time efficiently, it is worthwhile to quantify the performance objectives first (for
example, an 80% reduction in the time taken for a certain query, with no adverse effect
anywhere else on the server), and then work toward them.

The performance of a SQL Server application is highly dependent on the amount and
distribution of user activity (or workload) and data. Both the amount and distribution
of workload and data change over time, and differing data can cause SQL Server to
execute SQL queries differently. Therefore, to ensure an optimum system performance

- 35 -
SQL tuner
on a continuing basis, you will need to analyze performance at regular intervals.
Performance tuning is a never-ending process, as shown in Figure 1-1.

You can see that the steps to optimize the costliest query make for a complex process,
which also requires multiple iterations to troubleshoot the performance issues within the

- 36 -
SQL tuner
query and apply one change at a time. The steps involved in the optimization of the
costliest query are shown in Figure 1-2.

- 37 -
SQL tuner
As you can see from this process, there is quite a lot to do to ensure that you correctly
tune the performance of a given query. It is important to use a solid process like this in
performance tuning, to focus on the main identified issues.

Having said this, it also helps to try and keep a broader perspective about the problem
as a whole, since sometimes you may believe that you are trying to solve the correct
performance bottleneck, when in reality something else is causing the problem.

- 38 -
SQL tuner
Query Optimizer Architecture
An abstraction of query optimization process in DBMS is as above. Given a database and
a query on it, several execution plans exist that can be employed to answer the query. In
principle, all the alternatives need to be considered so that the one with the best
estimated performance is chosen. An abstraction of the process of generating and testing
these alternatives is shown in Figure 2, which is essentially a modular architecture of a
query optimizer. Although one could build an optimizer based on this architecture, in
real systems, the modules shown do not always have so clear-cut boundaries as in
Figure 2. Based on Figure 2, the entire query optimization process can be seen as having
two stages: rewriting and planning. There is only one module in the first stage, the
Rewriter, whereas all other modules are in the second stage. The functionality of each of
the modules in Figure 2 is analyzed below:

Rewriter
This module applies transformations to a given query and produces equivalent queries
that are hopefully more efficient, e.g., replacement of views with their definition,
flattening out of nested queries, etc. The transformations performed by the Rewriter
depend only on the declarative, i.e., static, characteristics of queries and do not take into
account the actual query costs for the specific DBMS and database concerned. If the
rewriting is known or assumed to always be beneficial, the original query is discarded;
otherwise, it is sent to the next stage as well. By the nature of the rewriting
transformations, this stage operates at the declarative level.

- 39 -
SQL tuner
Planner
This is the main module of the ordering stage. It examines all possible execution plans
for each query produced in the previous stage and selects the overall cheapest one to be
used to generate the answer of the original query. It employs a search strategy, which
examines the space of execution plans in a particular fashion. This space is determined
by two other modules of the optimizer, the Algebraic Space and the Method-Structure
Space. For the most part, these two modules and the search strategy determine the cost,
i.e., running time, of the optimizer itself, which should be as low as possible. The
execution plans examined by the Planner are compared based on estimates of their cost
so that the cheapest may be chosen. These costs are derived by the last two modules of
the optimizer, the Cost Model and the Size- Distribution Estimator.

Algebraic Space
This module determines the action execution orders that are to be considered by the
Planner for each query sent to it. All such series of actions produce the same query
answer, but usually differ in performance. They are usually represented in relational
algebra as formulas or in tree form. Because of the algorithmic nature of the objects
generated by this module and sent to the Planner, the overall planning stage is
characterized as operating at the procedural level.

Method-Structure Space
This module determines the implementation choices that exist for the execution of each
ordered series of actions specified by the Algebraic Space. This choice is related to the
available join methods for each join (e.g., nested loops, merge scan, and hash join), if
supporting data structures are built on the y, if/when duplicates are eliminated, and
other implementation characteristics of this sort, which are predetermined by the DBMS
implementation. This choice is also related to the available indices for accessing each
relation, which is determined by the physical schema of each database stored in its
catalogs. Given an algebraic formula or tree from the Algebraic Space, this module
produces all corresponding complete execution plans, which specify the
implementation of each algebraic operator and the use of any indices.

Cost Model
This module specifies the arithmetic formulas that are used to estimate the cost of
execution plans. For every different join method, for every different index type access,
and in general for every distinct kind of step that can be found in an execution plan,
there is a formula that gives its cost. Given the complexity of many of these steps, most
of these formulas are simple approximations of what the system actually does and are
based on certain assumptions regarding issues like buffer management, disk-cpu
overlap, sequential vs. random I/O, etc. The most important input parameters to a
formula are the size of the buffer pool used by the corresponding step, the sizes of
relations or indices accessed, and possibly various distributions of values in these
relations. While the first one is determined by the DBMS for each query, the other two
are estimated by the Size-Distribution Estimator.

Size-Distribution Estimator

- 40 -
SQL tuner
This module specifies how the sizes (and possibly frequency distributions of attribute
values) of database relations and indices as well as (sub) query results are
estimated. As mentioned above, these estimates are needed by the Cost Model. The
specific estimation approach adopted in this module also determines the form of
statistics that need to be maintained in the catalogs of each database, if any.

- 41 -
SQL tuner
Advanced Types of Optimization

Semantic Query Optimization


Semantic query optimization is a form of optimization mostly related to the Rewriter
module. The basic idea lies in using integrity constraints defined in the database to
rewrite a given query into semantically equivalent ones. These can then be optimized by
the Planner as regular queries and the most efficient plan among all can be used to
answer the original query. As a simple example, using a hypothetical SQL-like syntax,
consider the following integrity constraint :

assert sal-constraint on emp:


sal>100K where job = “Sr. Programmer”.

Also consider the following query:

select name, floor from emp, dept


where emp.dno = dept.dno and job = ‘Sr. Programmer’

Using the above integrity constraint, the query can be rewritten into a semantically
equivalent one to include a selection on sal:

select name, floor from emp, dept


where emp.dno = dept.dno and job = ‘Sr. Programmer’ and
sal>100K

Having the extra selection could help tremendously in finding a fast plan to answer the
query if the only index in the database is a B+-tree on emp.sal. On the other hand, it
would certainly be a waste if no such index exists. For such reasons, all proposals for
semantic query optimization present various heuristics or rules on which rewritings
have the potential of being beneficial and should be applied and which not.
Global Query Optimization
So far, we have focused our attention to optimizing individual queries. Quite often,
however, multiple queries become available for optimization at the same time, e.g.,
queries with unions, queries from multiple concurrent users, queries embedded in a
single program, or queries in a deductive system. Instead of optimizing each query
separately, one may be able to obtain a global plan that, although possibly suboptimal
for each individual query, is optimal for the execution of all of them as a group. Several
techniques have been proposed for global query optimization . As a simple example of
the problem of global optimization consider the following two queries:

select name, floor from emp, dept


where emp.dno = dept.dno and job = ‘Sr. Programmer’

select name from emp, dept


where emp.dno = dept.dno and budget > 1M.

- 42 -
SQL tuner
Depending on the sizes of the emp and dept relations and the selectivity’s of the
selections, it may well be that computing the entire join once and then applying
separately the two selections to obtain the results of the two queries is more efficient
than doing the join twice, each time taking into account the corresponding selection.
Developing Planner modules that would examine all the available global plans and
identify the optimal one is the goal of global/multiple query optimizers.
Parametric/Dynamic Query Optimization
As mentioned earlier, embedded queries are typically optimized once at compile time
and are executed multiple times at run time. Because of this temporal separation
between optimization and execution, the values of various parameters that are used
during optimization may be very different during execution. This may make the chosen
plan invalid (e.g., if indices used in the plan are no longer available) or simply not
optimal (e.g., if the number of available buffer pages or operator selectivity’s have
changed, or if new indices have become available). To address this issue,
several techniques have been proposed that use various search strategies (e.g.,
randomized algorithms or the strategy of Volcano) to optimize queries as much as
possible at compile time taking into account all possible values that interesting
parameters may have at run time. These techniques use the actual parameter values at
run time, and simply pick the plan that was found optimal for them with little or no
overhead. Of a drastically different flavor is the technique of Rdb/VMS , where by
dynamically monitoring how the probability distribution of plan costs changes, plan
switching may actually occur during query execution.

- 43 -
SQL tuner
Displaying Execution Plans

Execution Plan Basics

I have always considered one of the easiest methods to tune a stored procedure was to
simply study its execution plan. An execution plan is basically a road map that
graphically or textually shows the data retrieval methods chosen by the SQL Server
query optimizer for a stored procedure or ad-hoc query and is a very useful tool for a
developer to understand the performance characteristics of a query or stored procedure
since the plan is the one that SQL Server will place in its cache and use to execute the
stored procedure or query. Most developers will grow to the point that it will be a
simple matter for them to look at an execution plan and decide which step of a stored
procedure is causing performance issues.

Execution plans can be viewed in either a graphical or textual format depending on the
method used to obtain the execution plan. Query Analyzer and a small group of third-
party tools, I personally use mssqlXpress, available at www.xpressapps.com; have the
ability to turn the text-based plan into an easily viewed set of icons. From there is a
simple matter of understanding the different icons and knowing how to drill down into
the icon to retrieve detailed data.
If you do not use Query Analyzer or have a third party tool available, you can use
Transact-SQL to display a text-based execution plan. Transact-SQL provides several
commands to display execution plans, SET STATISTICS PROFILE, SET STATISTICS IO,
SET STATISTICS TIME, SET SHOWPLAN_ALL and SET SHOWPLAN_TEXT. You can
one or all of these commands to display a text-based execution plan with various
degrees of detailed information associated with that plan.

Graphics-Based Execution Plans

Most developers prefer the graphic-based execution plans displayed by Query Analyzer
or a third-party tool as they allow a quick glance to determine any major performance
problems with a query. While the methods to retrieve the graphical execution plans vary
by application, most of the icons used are very similar in functionality and appearance.
The next few examples will show you how to return graphical execution plans with
Query Analyzer. If you use a third party tool in your development, please see the help
section for that tool on execution plans to see how to display a graphical execution plan.

Query Analyzer Graphical Execution Plans


Once you have loaded your query or created a call to a stored procedure in the editor
plane, click Query on the toolbar and then select Show Execution Plan. Execute the
query and after the query has finished execution, select the Execution Plan tab to see the
graphical execution plan output.

- 44 -
SQL tuner
Example 1: Type in the following query, enable execution plans and then execute the
query.

--Change to pubs database


USE pubs
GO
--Select information from authors table
SELECT * FROM pubs.dbo.authors
GO

Execution Plan Output

- 45 -
SQL tuner
Query Analyzer Graphical Estimated Execution Plans

As you can see from the above example, Show Execution Plan will actually execute the
query or stored procedure and output the execution plan used by the optimizer to a
separate window. What if you did not want the query to actually execute but wanted to
get a sense of what the optimizer is going to do? Query Analyzer allows you to simply
show the estimated execution plan without actually running the query or stored
procedure by using the Display Estimated Execution Plan tool.

A point to remember is that you cannot generate an estimated plan if your query
contains temporary objects or references to objects the query builds, unless the objects
already exist. You will have to build the temporary object or permanent object first and
then obtain the estimated plan.
Once you have loaded your query or created a call to a stored procedure in the editor
plane, click Query on the toolbar and then select Display Estimated Execution Plan.
Execute the query and after the query has finished execution, select the Estimated
Execution Plan tab to see the graphical execution plan output.

Example 2 :Type in the following query, enable estimated execution plans and then
execute the query.

USE pubs
GO
--Select information from authors table
SELECT * FROM pubs.dbo.authors
GO

- 46 -
SQL tuner
As you will notice from the outputs shown above the graphics-based plans do not seem
to give you detailed information you may need to determine a problem. To obtain
detailed information from each icon you just have to place the cursor over an icon to
have the information displayed.

- 47 -
SQL tuner
Text-Based Execution Plans

If you do not have the ability to obtain a graphical execution plan, you can still use a
series of Transact-SQL commands to retrieve an execution plan. While not as flashy for
us visually orientated people, these Transact-SQL commands still provide a developer
with a wealth of information to be used to find performance issues within a stored
procedure or query.

SET SHOWPLAN_ALL - SET SHOWPLAN_ALL will instruct SQL Server not to execute
Transact-SQL statements but return detailed information about how the statements will
be executed and provides estimates of the resource requirements for the statements.
Syntax : SET SHOWPLAN_ALL {ON | OFF}

- 48 -
SQL tuner
Example 3 - Type and execute the following query.

--Enable SET SHOWPLAN_ALL


SET SHOWPLAN_ALL ON
GO
--Change to pubs database
USE pubs
GO
--Select information from authors table
SELECT * FROM pubs.dbo.authors
GO
SET SHOWPLAN_ALL Output

- 49 -
SQL tuner
SET SHOWPLAN_TEXT - will instruct SQL Server not to execute Transact-SQL
statements but return detailed information about how the statements are executed.
Syntax : SET SHOWPLAN_TEXT {ON | OFF}

Example 4 - Type and execute the following query.


--Enable SET SHOWPLAN_TEXT
SET SHOWPLAN_TEXT ON
GO
--Change to pubs database
USE pubs
GO
--Select information from authors table
SELECT * FROM pubs.dbo.authors
GO
SET SHOWPLAN_TEXT Output

- 50 -
SQL tuner
SET STATISTICS PROFILE instructs SQL Server to display the profile information for a
statement after executing the statement.

Syntax: SET STATISTICS PROFILE {ON | OFF}

Example 5: Type and execute the following query.

--Enable SET STATISTICS PROFILE


SET STATISTICS PROFILE ON
GO
--Change to pubs database
USE pubs
GO
--Select information from authors table
SELECT * FROM pubs.dbo.authors
GO

SET STATISTICS PROFILE Output

- 51 -
SQL tuner
SET STATISTICS IO - instructs SQL Server to display information regarding the amount
of disk activity generated by Transact-SQL statements after executing the statement.

Syntax: SET STATISTICS IO {ON | OFF}

Example 6 - Type and execute the following query.


--Enable SET STATISTICS IO
SET STATISTICS IO ON
GO
--Change to pubs database
USE pubs
GO
--Select information from authors table
SELECT * FROM pubs.dbo.authors

- 52 -
SQL tuner
GO

SET STATISTICS IO Output

- 53 -
SQL tuner
Estimated rows column in an execution plan

The number of rows estimated by the optimizer shown in the execution plan can be a
major factor in how the optimizer creates the execution plan. Understanding the number
of estimated rows can help a developer in understanding the options used by the
optimizer to create an execution plan. A large number of estimated rows can tell the
developer why a merge join is more appropriate than a nested loop or why an index
scan is favored over an index seek. Developers should investigate situations when
small numbers of estimated rows with large estimated costs are seen in execution plans.

The estimated rows shown in the execution plan and the actual number of rows
returned can serve as a major warning for a developer. The query optimizer uses
column statistics to determine the estimated row count for an execution plan and if those
column statistics are out of date the optimizer can make very bad choices in the options
it will use for a query. A developer should review the estimated rows in the execution
plan and the actual rows returned by the query to help determine if column statistics
need to be updated which can dramatically change the execution plan of a query and
increase the performance of that query.

To understand the problems with having out-dated statistics which will cause the
estimated row count to be wrong, let’s look at the following query.

The first thing we need to do is build a database with two tables, add a small number of
rows to the table, build some indexes, and then look at the execution plan for a simple
query.

SET NOCOUNT ON
USE master
GO

--Create new database


IF EXISTS(SELECT name FROM master.dbo.sysdatabases WHERE name =
'test_est_rows')
DROP DATABASE test_est_rows
GO
CREATE DATABASE test_est_rows
GO
USE test_est_rows
GO

--Create tables for query


IF OBJECT_ID('test_est_rows') IS NOT NULL

DROP TABLE test_est_rows


GO
IF OBJECT_ID('test_est_rows1') IS NOT NULL
DROP TABLE test_est_rows1
GO

- 54 -
SQL tuner
CREATE TABLE test_est_rows
(
intCol1 INTEGER
,intCol2 INTEGER
)

CREATE TABLE test_est_rows1


(
intCol1 INTEGER
,intCol2 INTEGER
)
GO

--Insert 10 rows into each table


DECLARE @intLoop INTEGER
SET @intLoop = 10

WHILE @intLoop > 0


BEGIN
INSERT INTO test_est_rows VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1


END

SET @intLoop = 10

WHILE @intLoop > 0


BEGIN
INSERT INTO test_est_rows1 VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1


END
GO

--Build nonclustered indexes


CREATE NONCLUSTERED INDEX ncl_test_est_rows ON test_est_rows(intCol1)
GO

--Build nonclustered indexes


CREATE NONCLUSTERED INDEX ncl_test_est_rows1 ON test_est_rows1(intCol1)
GO

--Obtain execution plan


SET STATISTICS PROFILE ON
GO

--Return data
SELECT t1.intCol1, t2.intCol1 FROM dbo.test_est_rows t1 INNER JOIN
dbo.test_est_rows1 t2 ON t1.intCol1 = t2.intCol1 WHERE t1.intCol1 = 5
GO

- 55 -
SQL tuner
--Obtain execution plan
SET STATISTICS PROFILE OFF
GO
Execution Plan for the above query

Next we will turn off SQL Server’s ability to update the statistics for the two indexes
created above and add 50,000 new rows to the tables. These added rows should produce
a situation in which the statistics are dramatically out-dated as these statistics are build
on 10 rows of data and not 50,010 rows of data.
--Turn off auto update stats
ALTER DATABASE test_est_rows SET AUTO_UPDATE_STATISTICS OFF
GO

--Modify number of rows in each table


DECLARE @intLoop INTEGER
SET @intLoop = 50000

WHILE @intLoop > 10


BEGIN
INSERT INTO test_est_rows VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1


END

SET @intLoop = 50000

WHILE @intLoop > 10


BEGIN
INSERT INTO test_est_rows1 VALUES (@intLoop,@intLoop)

SET @intLoop = @intLoop - 1


END
GO

--Obtain execution plan


SET STATISTICS PROFILE ON
GO

--Return data
SELECT t1.intCol2, t2.intCol2 FROM dbo.test_est_rows t1 INNER JOIN
dbo.test_est_rows1 t2 ON t1.intCol1 = t2.intCol1 WHERE t1.intCol1 = 5
GO

--Obtain execution plan

- 56 -
SQL tuner
SET STATISTICS PROFILE OFF
GO
Execution Plan for the above query

You can see from the execution plan, that the route taken by the optimizer is
dramatically different than the one from our 10 row table plan. In my case, the optimizer
thought their may have been 2,500,000 rows in the table instead of 50,010.
The optimizer tried to develop a plan based on the estimated rows and used parallelism,
table spool, and table scans to create the best route for the query.
Now, let’s take a look at what the optimizer will do if it had correct statistics to
determine the estimated rows.
--Update statistics
exec sp_updatestats
GO

--Obtain execution plan


SET STATISTICS PROFILE ON
GO

--Return data
SELECT t1.intCol2, t2.intCol2 FROM dbo.test_est_rows t1 INNER JOIN
dbo.test_est_rows1 t2 ON t1.intCol1 = t2.intCol1 WHERE t1.intCol1 = 5
GO

--Obtain execution plan


SET STATISTICS PROFILE OFF
GO
Execution Plan for the above query

Look, the index seeks are back. The estimated rows matched the actual rows. The
parallelism is gone. The table spool has been removed. A much better execution plan for

- 57 -
SQL tuner
the query has been produced once the optimizer has the correct statistics to obtain
estimated rows from.

Bookmark Lookups
One of the major overheads associated with the use of non-clustered indexes is the cost
of bookmark lookups. Bookmark lookups are a mechanism to navigate from a non-
clustered index row to the actual data row in the base table (clustered index) and can be
very expensive when dealing with large number of rows.
When a small number of rows are requested by a query, the SQL Server optimizer will
try to use a non-clustered index on the column or columns contained in the WHERE
clause to retrieve the data requested by the query. If the query requests data from
columns not contained in the non-clustered index, SQL Server must go back to the data
pages to obtain the data in those columns. It doesn’t matter if the table contains a
clustered index or not, the query will still have to return to the table or clustered index to
retrieve the data.
Bookmark lookups require data page access in addition to the index page access needed
to filter the table data rows. Because this requires the query to access two sets of pages
instead of only one, the number of logical READS performed by the query will increase.
If the data pages are not in the buffer cache, additional I/O operations will be required.
And in the case of most large tables, the index page and the corresponding data pages
are not usually located close to each other on the disk.
These additional requirements for logical READS and physical I/O can cause bookmark
lookups to become quite costly. While this cost may be acceptable in the case of small
result sets, this cost becomes increasingly prohibitive in the case of larger and larger
result sets. In fact, as the result sets become larger and larger, the optimizer may
consider the costs of the bookmark lookups to be too much and discard the non-
clustered index and simply perform a table scan instead.
Example of a Bookmark Lookup
SET STATISTICS PROFILE ON
GO

USE pubs
GO

--Find phone number for White, Johnson


SELECT phone FROM dbo.authors WHERE au_lname = 'White'
AND au_fname = 'Johnson'
GO

Execution Plan (abridged)


Rows Executes StmtText
----------- ----------- ----------------------------------------------------------------------------------
1 1 SELECT [phone]=[phone] FROM [dbo].[authors]
1 1 |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([pubs].[dbo].[authors]))
1 1 |--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind
Because both the au_lname and the au_fname are contained in a non-clustered index,
the optimizer can use the non-clustered index to filter the rows contained in the table to
return only the phone number requests. However, because the phone column in the

- 58 -
SQL tuner
authors table is not contained in the index or another non-clustered index, the optimizer
must return to the authors table in order to return the matching phone number creating
a bookmark lookup.
Finding the offending column(s)
In order to resolve the bookmark lookup, you must find the column or columns that
cause the bookmark lookup. To find offending columns look for the index usage in the
execution plan to find what index is utilized by the optimizer for the query.
Execution Plan (abridged)
StmtText
----------- ----------- ----------------------------------------------------------------------------------
|--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind]), SEEK:([authors].
[au_lname]='White' AND [authors].[au_fname]='Johnson') ORDERED FORWARD)
In this case we see that the authors.aunmind index is being used by the optimizer for the
query. A quick check of the columns included in the index using sp_helpindex on the
authors table will show that the index consists of the au_lname and au_fname columns.
Index_name    index_description                        index_keys
aunmind        nonclustered located on PRIMARY    au_lname, au_fname
A review of the execution plan OutputList column reveals that the phone column is
only remaining column being requested by the query.
Execution Plan (abridged)
OutputList
----------------------------------------------------------------------------------
[authors].[phone]
Since the phone column is not in the index, you can deduct that the phone column is the
offending column in this case.
Resolving bookmark lookups
Once you discover the columns responsible for a bookmark lookup, you will need to
consider one of four methods that are available to resolve the bookmark lookup.
1. Create a covering index
2. Remove the offending column
3. Convert a non-clustered index into a clustered index

Create a covering index


Given the example listed earlier in this section, if the following covering index had been
created, the result would be the removal of the bookmark lookup from the execution
plan.

CREATE NONCLUSTERED INDEX ncl_authors_phone ON authors(au_lname,


au_fname, phone)
GO
Execution Plan
SELECT [phone]=[phone] FROM [dbo].[authors] WHERE [au_lname]=@1 AND
[au_fname]=@2 |--Index Seek(OBJECT:([pubs].[dbo].[authors].[ncl_authors_phone]),
SEEK:([authors].[au_lname]=[@1] AND [authors].[au_fname]=[@2]) ORDERED
FORWARD)

- 59 -
SQL tuner
Remove the offending column
In the simple query below, the developer returned all the columns from the authors
table when all the query asked for was the ID of the author.
SET STATISTICS PROFILE ON
GO

USE pubs
GO

--Find ID number for White, Johnson


SELECT * FROM dbo.authors WHERE au_lname = 'White'
AND au_fname = 'Johnson'
GO
Execution Plan
StmtText
----------- ----------- ----------------------------------------------------------------------------------
SELECT * FROM [dbo].[authors] WHERE [au_lname]=@1 AND [au_fname]=@2 |--
Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([pubs].[dbo].[authors])) |--
Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind]), SEEK:([authors].
[au_lname]='White' AND [authors].[au_fname]='Johnson') ORDERED FORWARD)
Removing the additional, unneeded columns and only returning the au_id column will
remove the bookmark lookup in this case

SET STATISTICS PROFILE ON


GO

USE pubs
GO

--Find ID number for White, Johnson


SELECT au_id FROM dbo.authors WHERE au_lname = 'White'
AND au_fname = 'Johnson'
GO

Execution Plan
StmtText
----------- ----------- ----------------------------------------------------------------------------------
SELECT [au_id]=[au_id] FROM [dbo].[authors] WHERE [au_lname]=@1 AND
[au_fname]=@2 |--Index Seek(OBJECT:([pubs].[dbo].[authors].[aunmind]), SEEK:
([authors].[au_lname]=[@1] AND [authors].[au_fname]=[@2]) ORDERED FORWARD)
Bookmark lookups are often caused by additional columns being returned in the data
set “just in case” they are needed at a later date. Developers should strive to only
include columns in their result sets which are needed for the defined query
requirements. Additional columns can always be added at a later date.
Convert a non-clustered index into a clustered index
When developers are faced with bookmark lookups that cannot be removed with the
other choices described above, an alternative choice would be to convert an existing
index being used by the query into a clustered index. Converting an existing index into a
clustered index will place all the columns of the table in the index and prevent the need
for a bookmark lookup.

- 60 -
SQL tuner
SET STATISTICS PROFILE ON
GO
USE pubs
GO

--Find information for White, Johnson


SELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS
VARCHAR(12))
FROM dbo.employee WHERE emp_id = 'PMA42628M'
GO
Execution Plan
StmtText
----------- ----------- ----------------------------------------------------------------------------------
SELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS VARCHAR(12))
FROM dbo.employee
WHERE emp_id = 'PMA42628M'
|--Compute Scalar(DEFINE:([Expr1002]=[employee].[fname]+' '+[employee].[lname]+'
Hire Date:
'+Convert([employee].[hire_date])))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([pubs].[dbo].[employee]))
|--Index Seek(OBJECT:([pubs].[dbo].[employee].[PK_emp_id]),
SEEK:([employee].[emp_id]='PMA42628M') ORDERED FORWARD)
To resolve the bookmark lookup, the developer can change the existing clustered index
on the lname, fname, and minit columns into a non-clustered index.

--change original clustered index into a non-clustered index

DROP INDEX employee.employee_ind


GO
CREATE INDEX employee_ind ON employee(lname,fname,minit)
GO
Once the clustered index has been changed into a non-clustered index, a new clustered
index can be built on the emp_id column to resolve the bookmark lookup. In this
particular case the emp_id is the PRIMARY KEY of the table, so instead of an index, the
developer needs to recreate a clustered PRIMARY KEY.

--Create new clustered index


--Drop CONSTRAINT
ALTER TABLE EMPLOYEE DROP CONSTRAINT PK_emp_id
GO

--Recreate CONSTRAINT
ALTER TABLE EMPLOYEE ADD CONSTRAINT PK_emp_id PRIMARY KEY CLUSTERED
(emp_id)
GO

--Test removal of bookmark lookup


--Find information for White, Johnson
SELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS
VARCHAR(12))

- 61 -
SQL tuner
FROM dbo.EMPLOYEE WHERE emp_id = 'PMA42628M'
GO

Execution Plan

StmtText
----------- ----------- ---------------------------------------------------------------------
SELECT fname + ' ' + lname + ' Hire Date: ' + CAST(hire_date AS VARCHAR(12))
FROM dbo.employee WHERE emp_id = 'PMA42628M'
|--Compute Scalar(DEFINE:([Expr1002]=[employee].[fname]+' '+[employee].[lname]+'
Hire Date:
'+Convert([employee].[hire_date])))
|--Clustered Index Seek(OBJECT:([pubs].[dbo].[employee].[PK_emp_id]),

SEEK:([employee].[emp_id]='PMA42628M') ORDERED
FORWARD)

While converting a non-clustered index into a clustered index is a possible solution to


bookmark lookups, often applications depend on the current clustered index and this
solution will be almost impossible to implement in the real world.

- 62 -
SQL tuner
SQL Server - Indexes and Performance

One of the keys to SQL Server performance is ensuring that you have the proper indexes
on a table so that any queries written against this table can run efficiently. There are
more articles written about designing indexes, choosing columns, etc for optimizing
performance, so I will refrain from repeating most of what is written elsewhere. I have
included a few resources at the end of this article for this topic.
However once we have built the indexes, there is still work to be done. As your data sets
grow over time, SQL Server will continue to rebuild indexes and move data around as
efficiently as possible. This happens in a number of ways, but the result is that you may
need to perform maintenance on your indexes over time despite all of the automatic
tools built into SQL Server. This article will discuss some of the issues with data growth
over time as well as a technique to find tables in need of maintenance and how to
perform this maintenance.

What happens over time?

If SQL Server includes auto statistic updating, a query optimizer that can learn to be
more efficient with your queries, etc., why do we need to perform maintenance? Well,
let's examine what happens over time.

When you build an index on a table (let's assume a clustered index), SQL Sever parcels
the data across pages and extents. With v7.x and above, extents can be shared between
objects (with v6.5 extents contain a single object). As a result, let's assume you create a
table with rows that are < of a page in size. If you have 20 rows, then you have 5 pages
worth of data. Is your data stored on 5 pages? Only if your FILLFACTOR is 100%. The
fill factor determines how much, percentage wise, your pages are filled. let's assume a
FILLFACTOR of 50%, then you would have 10 pages of data allocated to this table. This
is getting complicated quickly, but let's examine it a bit more.

If you assume that we expand this example over time, we may grow to 100 pages of
data. These (at a minimum) require 7 extents if this object does not share any extents.
Each page within the extents links to another page with a pointer. The next page in the
chain, however, may not be in the same extent. Therefore as we read the pages, we may
need to "switch" to another extent.

The simplest example is assume we take 3 consecutive pages of data in the following
order:

Extent 1 Extent 2

Page n Page n + 1

Page n + 2

- 63 -
SQL tuner
These are any three pages where page n links to page n+1 next, then to page n+2 and so
on. To read these three pages we read extent 1, then switch to extent 2, then switch back
to extent 1. These "switches" do not necessarily entail physical I/O, but all of these
switches add up. They may not be a big deal on your local server or even a lightly
loaded server, a web application that has hundreds or thousands of users could see a
large performance impact from repeated scans of this table. Why does the table end up
looking like this? This is how the table is designed to function over time. SQL Server will
allocate space for each row based on the space available at that time. As a result, while a
clustered index stores the data in physical order on a page, the pages may not be in
physical order. Instead each page has a linkage to the next page in the sequence. Just as
your hard disk can become fragmented over time as you delete and insert files, the
allocations of pages for a table can be fragmented over time across extents as the data
changes.

So why doesn't SQL Server just rebuild the indexes? I am not sure if I would even want
it to do so. I would hate for this to occur right after a large web marketing campaign!
Instead the engineers in Redmond have left it up to the DBA to track this fragmentation
and repair it as necessary.

Running DBCC SHOWCONTIG


Prior to SQL Server 2000, you had to first get the object ID using the following command

SELECT object_id('<object name>')

For the user table

SELECT object_id('user')

This returned me some long number (from sysobjects) that means nothing to me, but the
SQL team in Redmond must use this often and did not feel like including the join in
their code. I guess someone complained long and loud enough because in SQL 2000 you
can use the name of the object in dbcc showcontig like this:

dbcc showcontig (user)

This produces the following statistics on your indexes:

DBCC SHOWCONTIG scanning 'User' table...

Table: ‘User' (962102468); index ID: 1, database ID: 7

TABLE level scan performed.

 Pages Scanned................................: 899


 Extents Scanned..............................: 121
 Extent Switches..............................: 897

- 64 -
SQL tuner
 Avg. Pages per Extent........................: 7.4
 Scan Density [Best Count:Actual Count].......: 12.58% [113:898]
 Logical Scan Fragmentation ..................: 99.89%
 Extent Scan Fragmentation ...................: 80.99%
 Avg. Bytes Free per Page.....................: 2606.5
 Avg. Page Density (full).....................: 67.80%

Above output is explained in details below :

Pages Scanned - Gives the # physical pages in the database scanned in this index. Not
really relevant, but gives you the total size occupied by this index ( each page is 8k)

Extents scanned - An extent is 8 pages. So this should be pretty close to Pages Scanned /
8. In this example we have 121 extents which is 968 pages. Since the index is only 899
pages, we have a number of shared extents. Not necessarily a bad thing, but this gives
you an idea that you are slightly fragmented. Of course, you do not know how much
physical fragmentation this is which can contribute to longer query times. The minimum
number for the 899 pages above would be 113. (899/8)

Extent Switches - # times the scan forced a switch from one extent to another. As this
gets close to # pages, you have pretty high fragmentation. . If you see number close to #
pages, then you may want to rebuild the index. See a Detailed Example.

Average Pages/Extent - Gives the math of Pages Scanned / Extents Scanned. Not of any
great value other than you don't have to run Calculator to get the number. Fully
populated extents would give a value of 8 here. I guess this is good for me

Scan Density [Best Count:Actual Count].......: 12.58% [113:898]

This is the tough one. This shows a percentage and 2 numbers separated by a colon. I
explain this as I missed it the first two times around. The percentage is the result of
dividing number 1 (113) by number 2 (898). So what are the two numbers?

The first number is the ideal number of extent changes if everything was linked in the a
contiguous chain. The second number is the number of extents moved through which is
1 more than the number of extent switches (by definition). This is really another view of
fragmentation. 100% would be minimal (I hate to say zero) fragmentation. As you can
see, this table is fairly fragmented. The scan is constantly switching back and forth from
one extent to another instead of finding a link from one page to another within an
extent.

Logical Scan Fragmentation ..................: 99.89%

I am still not sure what this means. I have not gotten a good explanation of this
anywhere, so here is my best interpretation. This shows how many pages (as a
percentage) in the index which have a pointer to the next page that is different than the

- 65 -
SQL tuner
pointer to the next page that is stored in the leaf (data) page. This is only relevant for
clustered indexes as the data (leaf pages) should be physically in the order of the
clustered index.

So how do you use this? If you figure it out, let me know. Since this number is high for
me and other items lead me to think this index is fragmented, I think this is bad. So try
for a low number in OLAP systems and a medium number in OLTP systems.

Extent Scan Fragmentation ...................: 80.99%

Again, here is the official BOL explanation (v7.x and 2000 Beta 2).

Percentage of out-of-order extents in scanning the leaf pages of an index. This number is
not relevant to heaps. An out-of-order extent is one for which the extent containing the
current page for an index is not physically the next extent after the extent containing the
previous page for an index.

This shows the percentage of pages where the next page in the index is not physically
located next to the current page. This tells me the I/O system must move fairly often
(80% of the time) when scanning the index to find the next page. A Detailed Explanation
is given below.

Avg. Bytes Free per Page.....................: 2606.5

This tells you (on average) how many bytes are free per page. Since a page is 8096 bytes,
it appears that I have on average, filled about 68% of the pages. This can be good or bad.
If this is an OLTP system with frequent inserts to this table, then with more free space
per page, there is less likely going to be a page split when an insert occurs. You want to
monitor this on tables with heavy activity and periodically rebuild this index to spread
out the data and create free space on pages. Of course you do this during periods of low
activity (read as 3am) so that there is free space and page splits are minimal during
periods of high activity (when everyone can yell at you for a slow database). Since this is
an OLTP system, I am in good pretty shape.

If this were an OLAP system, then I would rather have this be closer to zero since most
of the activity would be read based and I would want the reads to grab as much data as
possible from each page (to reduce the time it takes to read the index). As your OLAP
table grows, this becomes more critical and can impact (substantially) the query time for
a query to complete.

(build test data of 10,000,000 rows and test index of 99% v 1% fillfactor).

Avg. Page Density (full).....................: 67.80%

This gives the percentage based on the previous number (I calculated the number above
as 1 - (2606.5 / 8096) and rounded up.

- 66 -
SQL tuner
This all means that we need to defragment this table. There are a large number of extent
switches that occur, each of which could potentially cause a large I/O cost to queries
using this table and index.

Defragmenting Indexes

We rebuild the clustered index which causes the server to read this clustered index and
then begin moving the data to new extents and pages which should start putting
everything back in physical order and reduce fragmentation. There is another way:

In SQL 2000, the SQL developers added another DBCC option which is INDEXDEFRAG.
This can defragment both clustered and nonclustered indexes which (according to BOL)
should improve performance as the physical order will match the logical order and
(theoretically) reduce the I/O required by the server to scan the index.

A couple of caveats about this: If your index spans files, then it defragments each file
separately and does NOT move pages between files. Not a good thing if you have added
a new filegroup and allowed objects to grow across files. If 

A good thing that is way, way, way, extremely, absolutely, without-a-doubt long
overdue is the reporting of progress by DBCC INDEXDEFRAG as it works. Every 5
minutes this will report the estimated progress back to the user. Of course many of us
who have installed software with a feedback progress bar often wonder why the bar
moves quickly to 99% and remains there for 80% of the total install time. So time will tell
whether this is of any use, but I think some feedback is better than none.

Another addition that is way, way, way, (you get the idea) overdue is the ability to stop
the DBCC. I cannot tell you how many late nights I wished I could do this in v6.5. In fact
I often held off on running DBCC until the latest possible time since I could not stop it
once it started. (well, there was that O-N-O-F-F switch.)

Still one further addition that ranks above the other two is that this is an online
operation. Let me repeat that. It’s an ONLINE operation. It does not hold locks on the
table since it operates as a series of short transactions to move pages. It also operates
more quickly than a rebuild of a new index and the time required is related to the
amount of fragmentation for the object. Of course this means that you must have
extensive log space if this is a large index. Something to keep in mind and watch the log
growth when you run this to see how much space it eats up.

- 67 -
SQL tuner
How to Select Indexes for Your SQL Server Tables

Indexing is one of the most crucial elements in increasing the performance of SQL
Server. A well-written query will not show its effectiveness unless powered by an
appropriate index or indexes on the table(s) used in a query, especially if the tables are
large.

Indexes exist to make data selection faster, so the focus of this article is on ways you can
select the best indexes for faster data retrieval. This is done in a two-step process.

 Step One: Gathering Information


 Step Two: Taking Actions on the Information Gathered

Indexing can be quite a challenging task if you are not familiar with your databases, the
relationships between tables, and how queries are written in stored procedures and
embedded SQL. How many companies you have worked for have a proper ERD
diagram of their databases and have followed the textbook method style of
programming? In the real world, time is often limited, resulting in poor SQL Server
database performance.

If you have been tasked with optimizing a database's performance (at least to a
respectable level), or you want to be proactive with your databases to prevent potential
future performance issues, following these steps should help you in tuning tables, just as
they have helped me. These steps are applicable at any stage of project, even if a
deadline is just around the corner.

 Step One (Gathering Information)

Interact with the people who know about the database and its table structures. If you
know it already, that’s great. This is very important and makes your life easier.

1) Identify key tables, based on:

 Static tables (often called master tables).


 Highly transactional tables.
 Tables used within a lot of stored procedures or embedded SQL.
 Tables with an index size greater then its data's size. You can use sp_space used
with the table name to find table space usage.
 Top 10 or 15 big size tables. See a prior year database if available or applicable.
The idea is to identify the largest tables in the database after it is in production.

2) Identify the most frequently called stored procedures/queries and list all of the tables
used by them.

- 68 -
SQL tuner
3) Get the SQL Profiler trace of :

 Production site (if available/applicable). Running a trace on the production box


during typical activity is worth the effort and will be fruitful in later analysis.
 Testing site (if one is available/applicable).
 Otherwise, get if from your development server.

It is advisable to write down information you collect in a document for later retrieval.

4) Before we dive into analyzing the information gathered, here are few things to keep in
mind while tuning your tables:

 To see the Query/Execution plans of queries, highlight them in SQL Query


Analyzer (isqlw.exe) and select Display Estimated Query Plan (Cntl+L) from the
Query menu. If you want to see the query plan of a stored procedure, select
Show Execution Plan (Cntl+k) and execute the stored procedure. Also, turn on
the “Set Statistics IO on “ command. Examining Query/Execution plans can be a
bit time consuming. But you will find it easier if you really understand the
database and its tables before you begin.
 You need to have a good foundation on how clustered and non-clustered indexes
work.

Preferred SQL Server Index Types

When you use Query Analyzer to produce a graphical execution plan, you will notice
that there are several different ways SQL Server uses indexes.

Clustered Index Seek

A Clustered Index Seek uses the seeking ability of indexes to retrieve rows directly from
a clustered index. In most cases, they provide the best performance on SELECT
statements.

In Query Analyzer, go to pubs database. Type following query:

SELECT * FROM authors WHERE au_id LIKE'2%'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will
see the following in the “Estimated Execution Plan” tab.

- 69 -
SQL tuner
Take a close look at the Argument section of the above illustration. Notice that the
“UPKCL_auidind” clustered index is used to retrieve the data.

Index Seek

An Index Seek uses a non-clustered index to retrieve data, and in some ways, acts like a
clustered index. This is because all of the data retrieved is fetched from the leaf layer of
the non-clustered index, not from any of the data pages. You often see this behavior in a
covering index.

In Query Analyzer, go to pubs database and type following query:

SELECT title_id, title FROM titles WHERE title LIKE 't%'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will
see the following in the “Estimated Execution Plan” tab:

- 70 -
SQL tuner
In the Argument section in the above illustration, note that the “titleind” non-clustered
index is used to retrieve the data.

Bookmark Lookup

A Bookmark Lookup uses a non-clustered index to select the data. It starts with an index
seek in the leaf nodes of the non-clustered index to identify the location of the data from
the data pages, then it retrieves the necessary data directly from the data pages. Leaf
nodes of non-clustered indexes contain row locator that point to the actual data on data
pages.

In Query Analyzer, go to the pubs database. Type following query:

SELECT * FROM titles WHERE title LIKE 't%'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will
see the following in the “Estimated Execution Plan” tab.

- 71 -
SQL tuner
In the Argument section of the Index Seek, notice that the "titlecind" non-clustered index
is used, but once the data pages are identified from looking them up in the leaf pages of
the non-clustered index, then a Bookmark Lookup must be performed. Again, a
Bookmark Lookup is when the Query Optimizer has to lookup the data from the data
pages in order to retrieve it. In the Argument section of the Bookmark Lookup, note that
a Bookmark Lookup called "Bmk1000" is used. This name is assigned automatically by
the Query Optimizer.

Scans

Scans (Table scans, Index scan, and Clustered Index scans) are usually bad unless the
table has very few rows and the Query Optimizer determines that a table scan will
outperform the use of an available index. Watch out for scans in your execution plans.

In Query Analyzer, go to pubs database and type the following query:

SELECT * FROM employee WHERE hire_date > '1992-08-01'

Highlight the query and press. (Cntl + L) or highlight the query and press F5. You will
see the following in the “Estimated Execution Plan” tab:

- 72 -
SQL tuner
Notice that in this case, a Clustered Index Scan was performed, which means that every
row in the clustered index had to be examined to fulfill the requirements of the query.

Now that we understand some of the basics of how to read Query Execution Plans, let’s
take a look at some additional information that you will find useful when analyzing
queries for proper index use:

 If you create multiple query or a stored procedure execution plans at the same
time in Query Analyzer, you can compare the cost of each query or stored
procedure to see which is more efficient. This is useful for comparing different
versions of the same query or stored procedure.
 Primary Key constraints create clustered indexes automatically if no clustered
index already exists on the table and a non-clustered index is not specified when
you create the PRIMARY KEY constraint.
 Non-clustered indexes store clustered index keys as their row locators. This
overhead can be used as a benefit by creating a covering index (explained later).
Using covering indexes requires caution.
 A table's size comprises both the table’s data and the size of any indexes on that
table.

 Adding too many indexes on a table increases the total index size of atable and
can often degrade performance.
 Always add a clustered index to every table, unless there is a valid reason not to,
like the table has few rows.
 Seeks shown in Query/Execution plans for SELECT statements are good for
performance, while scans should be avoided.
 A table's size (number of rows) is also a major factor used up by Query
Optimizer when determining best query plan.

- 73 -
SQL tuner
 Index order plays an important role in many query plans. For example, in the
authors table of the pubs database, a non-clustered index is defined in the order
of au_lname, then au_fname.

Fine Query A

SELECT * FROM AUTHORS WHERE au_lname like 'r%'

This uses a Bookmark Lookup and an Index seek.

 Fine Query B

SELECT * FROM AUTHORS WHERE au_lname LIKE 'r%' AND au_fname like
‘a’

This uses a Bookmark Lookup and an Index Seek.

 Not so Fine Query C

SELECT * FROM AUTHORS WHERE au_fname LIKE ‘a’

This uses a Clustered Index Scan.

 SQL Server 2000 (not earlier versions) offers both ascending and descending sort
orders to be specified in an index. This can be useful for queries, which uses the
ORDER BY DESC clause.
 To find a particular word (for e.g. a table name) used in all stored procedure
code, you can use the following code to identify it. For example, you can use this
to find out a list of SP’s using a table.

SELECT DISTINCT a.name AS SPName FROM syscomments b, sysobjects a


WHERE b.text LIKE '%authors%' AND a.id=b.id AND a.type='P'

This query brings all SP’s having text “authors” in their code. Note that this query might
bring extra SP’s, for example, if a stored procedure uses text in a comment.

- 74 -
SQL tuner
Step Two: What to Do Once You Have Gathered the Necessary Information

Actions for Key Tables

For static tables (tables that rarely, if ever change), you can be liberal on the number of
indexes that can be added. As mentioned earlier, too many indexes can degrade the
performance of highly transactional tables, but this does not apply to tables whose data
will not change. The only consideration possible could be disk space. Set all index fill
factors on static tables to 100 in order to minimize disk I/O for even better performance.

For highly transactional tables, try to limit the number of indexes. Always keep in mind
that a non-clustered index contains the clustered index key. Because of this, limit the
number of columns on your clustered index in order to keep their size small. Any index
for busy transactional tables has to be highly justifiable. Choose the fill factor with
caution (usually 80 to 90%) for these indexes in order to avoid potential page splitting.

For tables used lot in stored procedures/embedded SQL, these tables play an important
role in the total application lifetime as they are called most often. So they require special
attention. What is important is look at how tables are being accessed in queries in order
to eliminate scans and convert them into seeks. Watch the logical I/O used by "Set
Statistics IO on" to help you determine which queries are accessing the most data. Less
logical I/O is better than more. Choose clustered index with caution. Depending on how
transactional the table is, choose a higher fill factor.

For tables with index size greater then data size implies a lot of indexes, so review
indexes and make sure their existence is useful and justified.

For the Top 10 or 15 largest tables, keep this fact in mind when creating indexes for these
types of tables, as their indexes will also be large. Also check to see if the tables are static
or non-static, which is helpful information when deciding what columns need to be
indexed.

For the most frequently called Stored procedures/Embedded SQL, See the Query plans
and Logical I/O page use.

SQL Profiler Trace is a very good tool. It tracks calls getting executed in SQL Server at
any given point of time, their execution time, I/O reads, user logins, executing SQL
statement, etc. It can also be used as debugging tool. An analysis of a Profiler trace is
important to identify slow running queries. You can set the duration to > 100ms to see
queries which take more then 100 milliseconds to execute.

- 75 -
SQL tuner
Using a Covering Index + Non-clustered Index Uses Clustered Index as a Row Locator
One can leverage the fact that non-clustered indexes store clustered index keys as their
row locators. Meaning that a non-clustered index can behave as a clustered index if the
index has all of the columns referenced in SELECT list, WHERE clause, and JOIN
conditions of a query.

Example 1

In the Orders table the NorthWind database, there currently is a non-clustered index on
the ShippedDate column.

Try running the following:

SELECT ShippedDate, shipcity FROM orders WHERE ShippedDate > '8/6/1996'

The query plan of statement produces a Clustered Index Scan.

Now add the column shipcity to the non-clustered index on ShippedDate.

CREATE INDEX [ShippedDate] ON [dbo].[Orders] ([ShippedDate], [ShipCity])


WITH DROP_EXISTING

Now run the query again. This time, the query plan of statement produces an Index
Seek.

This magic happened because all fields (ShippedDate and ShipCity) in the SELECT and
the WHERE clauses are part of an index.

Example 2

In the Titles table of the Pubs database, check out the following execution plan for this
query:

SELECT title_id, title FROM titles WHERE title LIKE 't%'

Notice that the execution plan shows an Index Seek, not a Bookmark Lookup (which is
what you usually find with a non-clustered index). This is because the non-clustered
index on the Title column contains a clustered index key Title_Id, and this SELECT has
only Title_Id, Title in the SELECT and in the WHERE clause.

- 76 -
SQL tuner
Analyzing a Database Execution Plan

My everyday job is to develop back-office applications for a mobile telecom operator.


When a customer orders a service through the web or voice front-end, our applications
have to provide a very quick feedback. Although we are required to answer in less than
one second, we have to perform complex SQL statements on our databases which are
dozens of GBs.

In this environment, a single inefficient query can have disastrous effects. A bad
statement may overload all database processors, so that they are no longer available to
serve other customers' orders. Of course, such problems typically occur shortly after the
launch of new offers... that is, precisely under heavy marketing fire. Could you imagine
the mood of our senior management if such a disaster happens?

Unfortunately, suboptimal statements are difficult to avoid. Applications are generally


tested against a much smaller amount of data than in production environment, so
performance problems are not likely to be detected empirically.

That's why every database developer (and every application developer coping with
databases) should understand the basic concepts of database performance tuning. The
objective of this article is to give a theoretical introduction to the problem. At the end of
this article, you should be able to answer the question: is this execution plan reasonable
given the concrete amount of data I have?

I have to warn you: this is about theory. I know everyone dislike it, but there is no
serious way to go around it. So, expect to find here a lot of logarithms and probabilities...
Not afraid? So let's continue.

Scenario

I need a sample database for the examples of this article. Let's set up the scene.

The CUSTOMERS table contains general information about all customers. Say the
company has about a million of customers. This table has a primary key
CUSTOMER_ID, which is indexed by PK_CUSTOMERS. The LAST_NAME column is
indexed by IX_CUSTOMERS_LAST_NAME. There are 100000 unique last names.
Records in this table have an average of 100 bytes.

- 77 -
SQL tuner
The REGION_ID column of the CUSTOMERS table references the REGIONS table,
which contains all the geographical regions of the country. There are approximately 50
regions. This table has a primary key REGION_ID indexed by PK_REGIONS.

I will use the notations RECORDS(CUSTOMERS) and PAGES(CUSTOMERS) to denote


respectively the number of records and pages in the CUSTOMERS table, and similarly
for other tables and even for indexes. Prob[CUSTOMERS.LAST_NAME = @LastName]
will denote the probability that a customer will be named by the @LastName when we
have no other information about him.

What is an execution plan

An SQL statement expresses what you want but does not tell the server how to do it.
Using an SQL statement, you may for instance ask the server to retrieve all customers
living in the region of Prague. When the server receives the statement, the first thing it
does is to parse it. If the statement does not contain any syntax error, the server can go
on. It will decide the best way to compute the results. The server chooses whether it is
better to read completely the table of customers, or whether using an index would be
faster. It compares the cost of all possible approaches. The way that a statement can be
physically executed is called an execution plan or a query plan.

An execution plan is composed of primitive operations. Examples of primitive


operations are: reading a table completely, using an index, performing a nested loop or a
hash join,... We will detail them in this series of articles. All primitive operations have an
output: their result set. Some, like the nested loop, have one input. Other, like the hash
join, have two inputs. Each input should be connected to the output of another primitive
operation. That's why an execution plan can be sketched as a tree: information flows
from leaves to the root. There are plenty of examples below in this article.

The component of the database server that is responsible for computing the optimal
execution plan is called the optimizer. The optimizer bases its decision on its knowledge
of the database content.

How to inspect an execution plan

If you are using Microsoft SQL Server 2000, you can use the Query Analyzer, to which
execution plan is chosen by the optimizer. Simply type an SQL statement in the Query
window and press the Ctrl+L key. The query is displayed graphically:

- 78 -
SQL tuner
As an alternative, you can get a text representation. This is especially useful if you have
to print the execution plan. Using a Command Prompt, open the isql program (type isql
-? to display the possible command line parameters). Follow the following instructions:

1. Type set showplan_text on, and press Enter.


2. Type go, and press Enter.
3. Paste your SQL statement at the command prompt, and press Enter.
4. Type go, and press Enter.

The top operation of this execution plan is a hash join, whose inputs are an index scan of
UNC_Dep_DepartmentName and a clustered index scan of PK_USERS. The objective of
this series of articles is to learn how to understand such execution plans.

What are we optimizing?

Application developers usually have to minimize processor use and sometimes memory
use. However, when developing database applications, the bottleneck is elsewhere. The
main concern is to minimize disk access.

The main disk allocation unit of database engines is called a page. The size of a page is
typically some kilobytes. A page usually contains between dozens and hundreds of
records. This is important to remember: sometimes you may think a query is optimal
from the point of view of the record accesses, while it is not if you look at page accesses.

Looking for records in tables

 Full table scan

Say we are looking for a few records in a single table -- for instance we are
looking for the customers whose last name is @LastName.

- 79 -
SQL tuner
sql1 ::= SELECT * FROM CUSTOMERS WHERE LAST_NAME = @LastName

The first strategy is to read records from the table of customers and select the
ones fulfilling the condition LAST_NAME = @LastName. Since the records are not
sorted, we have to read absolutely all the records from the beginning to the end
of the table. This operation is called a full table scan. It has linear complexity,
which means that the execution time is a multiple of the number of rows in the
table. If it takes 500 ms to look for a record in a table of 1000 records, it may take
8 minutes in a table of one million records and 5 days in a table of one billion
records...

To compute the cost of sql1, we set up a table with primitive operations. For each
operation, we specify the cost of one occurrence and the number of occurrences.
The total cost of the query is obviously the sum of the products of operation unit
cost and number of repetitions.

Operation Unit Cost Number


Full table scan of PAGES(CUSTOMERS
1
CUSTOMERS )

Let's take a metaphor: a full table scan is like finding all occurrences of a word in
a Roman.

 Index seek and index range scan

Now what if the book is not a Roman but a technical manual with an exhaustive
index at the end? For sure, the search would be much faster. But what is
precisely an index?

o An index is a collection of pairs of key and location. The key is the word
by which we are looking. In the case of a book, the location is the page
number. In the case of a database, it is the physical row identifier.
Looking for a record in a table by physical row identifier has constant
complexity, that is, it does not depend on the number of rows in the
table.
o Keys are sorted, so we don't have to read all keys to find the right one.
Indeed, searching in an index has logarithmic complexity. If looking for a
record in an index of 1000 records takes 100 ms, it may take 200 ms in an
index of million of rows and 300 ms in an index of billion of rows. (Here
I'm talking about B-Tree indexes. There are other types of indexes, but
they are less relevant for application development).

- 80 -
SQL tuner
If we are looking for customers by name, we can perform the following physical
operations:

o Seek the first entry in IX_CUSTOMERS_LAST_NAME where


LAST_NAME=@LastName. This operation is named an index seek.
o Read the index from this entry to the last where the
LAST_NAME=@LastName is still true. This will cost to read
PAGES(IX_CUSTOMERS_LAST_NAME)*Prob[LAST_NAME=@LastName]
pages from disk. This operation (always coupled with an index seek) is
called an index range scan.
o Each index entry found by the previous steps gives us the physical
location of the a record in the CUSTOMERS table. We still have to fetch
this record from the table. This will imply
RECORDS(CUSTOMERS)*Prob[LAST_NAME=@LastName] page fetches.
This operation is called a table seek.

The detailed cost analysis of sql1 using an index range scan is the following.

Operation Unit Cost Number


Index Seek of
Log( PAGES(IX_CUSTOME
IX_CUSTOMERS_LAST_NA 1
RS_LAST_NAME) )
ME
PAGES(IX_CUSTOMERS_L
Index Range Scan of
AST_NAME)*
IX_CUSTOMERS_LAST_NA 1
Prob[LAST_NAME =
ME
@LastName]
RECORDS(CUSTOMERS
Table Seek of CUSTOMERS 1 )* Prob[LAST_NAME =
@LastName]

Bad news is that the query complexity is still linear, so the query time is still a
multiple of the table size. Good news is that we cannot do really better: the
complexity of a query cannot be smaller than the size of its result set.

In the next section of this article, we will accept a simplification: we will assume
that index look-up has unit cost. This estimation is not so rough because a
logarithmic cost can always be neglected if it is added to a linear cost. This
simplification is not valid if it is multiplied to another cost.

 Index selectivity

Comparing the cost of the full table scan approach and the index range scan
approach introduces us to a crucial concept in database tuning. The conclusion of

- 81 -
SQL tuner
the previous section is that the index range scan approach shall be faster if, in
terms of order of magnitude, the following condition is true:

[1] RECORDS(CUSTOMERS)* Prob[LAST_NAME = @LastName]<


PAGES(CUSTOMERS)

The probability that a customer has a given name is simply the number
customers having this name divided by the total number of customers. Let
KEYS(IX_CUSTOMERS_LAST_NAME) denote the number of unique keys in the
index IX_CUSTOMERS_LAST_NAME. The number of customers named
@LastName is statistically
RECORDS(CUSTOMERS)/KEYS(IX_CUSTOMERS_LAST_NAME).

So the probability can be written:

[2] Prob[LAST_NAME = @LastName] =

(RECORDS(CUSTOMERS)/
KEYS(IX_CUSTOMERS_LAST_NAME))/RECORDS(CUSTOMERS)

= 1 / KEYS(IX_CUSTOMERS_LAST_NAME)

Injecting [2] in [1] we have:

[3] RECORDS (CUSTOMERS)/ KEYS(IX_CUSTOMERS_LAST_NAME)<


PAGES(CUSTOMERS)

That is, an index is adequate if the number of records per unique key is
smaller than the number of pages of the table.

The inverse of the left member of the previous expression is called the selectivity
of an index:

SELECTIVITY(IX_CUSTOMERS_LAST_NAME) =

KEYS(IX_CUSTOMERS_LAST_NAME) / RECORDS(CUSTOMERS)

The selectivity of a unique index is always 1. The more an index is selective (the
larger is its selectivity coefficient), the more is its efficiency. Corollary: indexes
with poor selectivity can be counter-productive.

Joining tables with nested loops

- 82 -
SQL tuner
Things become much more difficult when you need to retrieve information from more
than one table.

Suppose we want to display the name of the region besides the name of the customer:

SELECT d.NAME, e.FIRST_NAME, e.LAST_NAME


FROM CUSTOMERS e, REGIONS d
WHERE e.REGION_ID = d.REGION_ID

Among the possible strategies, I will present in this article the most natural: choosing a
table, reading it from the beginning to the end and, for each record, search the
corresponding record in the second table. The first table is called the outer table or
leading table, and the second one the inner table. The dilemma is of course to decide
which table should be leading.

So let's first try to start with the table of regions. We learnt before that an index on
CUSTOMERS.REGION_ID would have too low selectivity to be efficient, so our first
candidate execution plan is to read the table of regions and, for each region, perform a
full table scan of CUSTOMERS.

Operation Unit Cost Number


Full table
scan of PAGES(REGIONS) 1
REGIONS
Full table
scan of PAGES(CUSTOME RECORDS(REGIO
CUSTOME RS) NS)
RS

The leading cost is clearly PAGES(CUSTOMERS)*RECORDS(REGIONS). If we give


numeric values, we have approximately 50*PAGES(CUSTOMERS).

Now what if we did the opposite? Since the table of regions is so small that it has a single
page, it is useless to have an index, so we choose again two nested full table scans.

Operation Unit Cost Number


Full table
scan of PAGES(CUSTOM
1
CUSTOME ERS)
RS
Full table
PAGES(REGIONS RECORDS(CUSTOM
scan of
) ERS)
REGIONS

- 83 -
SQL tuner
At first sight, the leading cost is PAGES(REGIONS)*RECORDS(CUSTOMERS). Since the
table of regions is so small that it fits in one page, and since we have approximately 80
customer records per page (pages have, say, 8K),we can write that the leading cost is
80*PAGES(CUSTOMERS), which seems a little worse than the first approach. However,
this second join order shall be must faster than the first one. To see this, we have to take
into account a factor that we have forgotten up to now: the memory cache.

Since we are interested only in minimizing disk access, we can consider that the cost of
reading a page from memory is zero.

The REGIONS table and its primary key can both be stored in cache memory. It follows
that the cost matrix can be rewritten as follows:

Operation Unit Cost Number


Full table
scan of PAGES(CUSTOM
1
CUSTOME ERS)
RS
First Full
table scan PAGES(CUSTOM
1
of ERS)
REGIONS
Next Full
table scan RECORDS(CUSTOM
0
of ERS)
REGIONS

So, finally the leading cost term is PAGES(CUSTOMERS), which is around 200 times
better than the first join order.

- 84 -
SQL tuner
System Planning
Depending on the analysis done we came up with following plan.

Parsing the
Query

Get the
execution plan
used by the
Query Optimizer

Y Is N
Indexes
A used
A
N
Is Index N Is
Scan Table
Scan
N
Is Index Y
Y
Seek
Add Clustered
Y
Is Index on
Y
Clustered Columns with
index scan Query Optimized most number of
A
Distinct Values
N
N

Is Non-
Clustered
index
scan
Y

Give
Suggestions to
Optimize

- 85 -
SQL tuner
 Methodology
Connect Form

This is the second form of the project, whenever the form is load the server combo box is
filled with all the server names depending on the data source used in the connection
string. The user then selects the server name and if Windows Authentication is been
selected then the server is been connected using windows username and password, or
enters username and password if SQL Authentication option is been selected, when
clicked on OK button if SQL authentication is been selected a new connection string is
been created by using the entered Server Name, Username and Password. If the
connection string succeeds the user will be logged into the corresponding SQL Server
and goes to the next form i.e. Main Form.

START
C

Show frmConnect
Form

ServerName,
UserName,
Password

Y Check Username
Is OK
Clicked and Password

N
Show Form frmMain

N Is Cancel
Do nothing
Clicked
M
Y

STOP

- 86 -
SQL tuner
Main Form

Is File Menu
Clicked F
Y

Is Edit Menu
Clicked E
Y

Is Query
Menu Q
Clicked Y

Is Window
Menu W
Clicked Y

Is Utilities
Menu U
Y
Clicked

STOPN

- 87 -
SQL tuner
This is the MDI Form of the project and by default it contains the Analyzer form which
is a child of the MDI form. This form contains the main functionality of the project.
It contains six main menus and a toolbar. The main function is to parse the query.
Parsing here means optimizing the query given by the user.

The parsing function works as follows, the query given by the user is been
executed to obtain the ShowPlan using SHOWPLAN_ALL ON function of SQL. After
getting the ShowPlan it is been checked whether Index scan or Table scan is used. If
indexes are used then it is checked whether index seek is used or index scan is used and
accordingly output is given. If Table Scan is used the suggestions given to the user for
applying indexes by calculating the distinct no. of values for each column of the tables.

The executing of the query is done by passing the SQL query to the SQL engine for the
output.

- 88 -
SQL tuner
File Menu
F
N

Is Connect
Clicked C
Y

Is Disconnect Disconnect the


Clicked connection with
Y the SQL Server

Is New Open a new


Clicked blank document
Y

Is Open Open an existing


Clicked SQL document
Y

Is Save Saves the


Clicked document
Y

Close the current


Is Exit
SQL document
Clicked Y
Y

STOPY

- 89 -
SQL tuner
Edit Menu

Is Undo Undo’s the last


Clicked Action taken
Y

Is Cut Cut’s the


Clicked Selected Text
Y

Is Copy Copy’s the


Clicked Selected Text
Y

Paste’s the Text


Is Paste from the
Clicked Clipboard
Y

Selects all the


Is Select All text present in
Clicked the SQL
Y
Document

STOP

- 90 -
SQL tuner
Query Menu

Is Change
Database D
Y
Clicked

Gives suggestions about


Is Parse the optimization the
Clicked query entered by the
Y user

Runs the query


Is Execute entered by the
Clicked Y user

Is Cancel Stops the


Execution execution of the
Y query
Clicked

STOP

- 91 -
SQL tuner
Window Menu

Is Switch Changes the


Pane Clicked focus to the next
Y pane

Is Hide Result Hides/Shows


Pane Clicked the result pane
Y

STOP

- 92 -
SQL tuner
Utility Menu

Is Insert/
Update
I
Y
Template
Clicked

STOP

- 93 -
SQL tuner
Change Database Form

Change the
current database
Y
Is OK Clicked to the database
selected from the
list
N

Is Cancel
Y Clicked

STOP

This is the form used to change the database on which the user wants to query. When
the form is load the form displays the entire database name for that particular SQL
server that has been connected to, in a data grid. To change the database user can click
on the OK button or can double click the database which the user wants to use. When
clicked on Ok or double clicked a new connection string is been created with database
name that has been selected. Cancel button will exit the form without doing any changes
to the database selection.

- 94 -
SQL tuner
Insert\Update Template Form

N
N
Is Insert Is
Checke Update
d Clicked

Y
Select table
from the list
for the
current
database

Is All N
Column
s
Clicked
Y
Select Select
columns columns
from the list from the list
for the table manually
selected
Y

N Is Y
Generat
e
Clicked Generate a script
N Is Y
Cancel
Clicked
STOP

- 95 -
SQL tuner
This form is used to create templates for Insert or update query depending on the
selection of the user. This form is also a Child form of the above mentioned MDI form.
The database is selected from the combo box present on the Main form. A connection
string is been created with that particular database.
A combo box is filled with all the tables present for that database. A list box is
filled with all the columns depending on the table selected from the combo box using
stored procedure called sp_Columns which will give all the columns of a particular
table. The user has two options now to select the columns manually or select all the
columns. All columns can be selected by clicking on a check box which will checked all
the columns in the list box. Then the user clicks on Generate to create the template. On
generate the user checks for the all the details if details are not filled a message box will
be given, if all details are filled it is checked that Insert or update has been selected.
Depending on the choices their templates will be created.

- 96 -
SQL tuner
System Implementation
Prerequisites for system implementation.

 .NET Framework 2.0


 SQL Server 2000

.NET Framework 2.0 Installation

Step 1: Insert CD 1 for .Net and then clicked on Windows Component.


Step 2: Then insert CD 5 for .Net and it will start installing the .Net Windows
Component as shown in fig below when clicked on “Update Now”.

- 97 -
SQL tuner
Step 3: The Windows component includes “Microsoft .NET Framework” as shown in
the fig below.

Step 4: Installation of Windows Component including Microsoft .NET Framework.

- 98 -
SQL tuner
SQL Server 2000 Installation :

Step 1: Insert CD of Microsoft SQL Server 2000 then select “SQL Server 2000
Components” as shown in fig below.

Step 2: Select “Install Database Server” as shown in fig below.

- 99 -
SQL tuner
Step 3: Click on “Next” on Welcomes screen as shown in fig below.

Step 4: Select the option “Local Computer” to install the SQL Server on the local
machine.

- 100 -
SQL tuner
Step 5: On next screen for installation options select “Create a new instance of SQL
Server, or install Client Tools.”

Step 6: On Types of Installation select “Server and Client Tools” for installing a Server.

- 101 -
SQL tuner
Step 7: On Service Accounts screen select the option “Use a Domain User account” and
enter the username, password and the machine name for the Windows user account.

Step 8: On “Authentication mode” screen select “Mixed Mode” which is used for both
Windows Authentication and SQL Server Authentication.

- 102 -
SQL tuner
SQL Tuner Installation

Step 1: Open the SQL Tuner “Installation Package” and click on “SQL_Tuner.msi”.

Step 2: Click on the “Next” button on the Welcome screen to install the SQL Tuner.

- 103 -
SQL tuner
Step 3: On Confirm Installation Screen click “Next” button to install SQL Tuner.

Step 4: Installation of the SQL Tuner while be in progress.

- 104 -
SQL tuner
Step 5: Installation is completed, click on “Close” to exit the setup.

- 105 -
SQL tuner
Technical Specification

Hardware Requirements

Requirements Minimum Recommended

Processor 900 MHz 1.2 GHz

RAM 128 MB 512 MB

Disk Space 500 MB 500 MB

Operating System Windows 2000, XP Windows 2000, XP, 2003

Software Requirements

 .NET Framework 2.0


 SQL Server 2000

- 106 -
SQL tuner
User Manual
Connect Form (frmConnect.cs)

This form is used to connect to a particular server for performing the optimization or
executing a query.
This is the first perform of the SQL Tuner tool that the user will face. This form will
decide that the new blank document which will be created belongs to which server.
Working:
 Select the server to which you want to connect by selecting a particular SQL
Server from the combo box.
 Select from the option buttons whether you want to connect to the above
mentioned SQL Server using "Windows Authentication" or by using "SQL Server
Authentication".
 If  "Windows Authentication" is selected the SQL Tuner will connect to the SQL
Server using the logged in users, username and password.
 If "SQL Server Authentication" is selected the SQL Server will connect to SQL
Server using the UserName and Password of the particular SQL Server provided
by the user.
 UserName and Password textboxes will be enabled only if "SQL Server
Authentication" is selected.
 "OK" button will check for the particular SQL Server is present or not and
whether the username and password is valid or not. If details provided by the
user are valid then a new blank SQL query document will be created and if
details provided are not valid then user will get the notification.

- 107 -
SQL tuner
Main Form (frmMain.cs)

 
This form is the main form of the tool. It is an MDI form which contains all the forms
provided by the SQL Tuner. Mainly as the default this form contains the Analyzer form
in the startup.
This form contains all the important functionality provided by the SQL Tuner.

Working
1. This form contains five Main Menus which are as follows:

File
Connect form : This sub menu is used to open a new SQL document for a particular
Disconnect : This sub menu is used to disconnect the SQL document from the SQL
Server it was connected to and then close the document.
New : This sub menu is used to open a new SQL document.
Open : This sub menu is used to open an existing SQL document which has been saved
before.
Save : This sub menu is used to save an SQL document.
Exit : This sub menu is used to exit from the form.

- 108 -
SQL tuner
Edit
Undo : This sub menu is used to remove the changes done to the SQL document.
Cut : This sub menu is used to cut the selected text of the SQL document to the Clip
Board.
Copy : This sub menu is used to copy the selected text of the SQL document to the Clip
Board.
Paste : This sub menu is used to paste the text on the Clip Board to the SQL document.
Select ALL : This sub menu is used to select the text present on the SQL document.

Query
Change Database : This sub menu is used to change the database on which user will
prepare the query.
Parse :  This sub menu is used to provide suggestions to the user regarding the Syntax
and Index of the query which is been parsed.
Execute :  This sub menu is used to provide the output of the query which is been
executed by the user in the SQL document.
Cancel Execution : This sub menu is used to cancel an execution of the query which is
been submitted to the SQL Tuner for execution.

Window

Switch Pane : This sub menu is used to switch pane of the Analyzer form from one
control to another.
Hide Result Pane : This sub menu is used to toggle the Result pane of the Analyzer form.
Utilities
Insert/Update Template : This sub menu is used to create an SQL statement for Insert or
Update statements.

- 109 -
SQL tuner
Analyzer Form(frmAnalyzer.cs)

 
This form is the actual SQL document on which the Main form's functionality will
perform actions. This form is the child form of the Main form and user can open more
than one Analyzer form in SQL Tuner.
  
Working
 The first textbox provides the user to input the query.
 The Data grid provides the user with the output of its query.
 The second textbox provides the user any recommendations or any error in the
SQL query.

- 110 -
SQL tuner
Change Database Form(frmChangeDB.cs)

This form is used by the user to change the database on which user wants to run the
SQL query written in the SQL document.
The form lists all the databases present in the SQL Server on which the SQL document is
been connected.
 
Working
 The grid provides the user with all the databases provided by the SQL Server to
which the SQL document is connected.
 Select the database you want to work on and then click on "OK" button.

- 111 -
SQL tuner
Insert/Update Template Form (frmTemplate.cs)

 
This form is used to create an Insert or an Update statements for the selected tables and
its corresponding columns.
The form will create the statements with complete syntax and the values will be given
by a variable.

Working
 The user has to select the option of "Insert" or "Update" depending on type of
statement the user wants to generate. 
 The combo box provides the name of the table provided by the database. User
has to select one table from the combo box. 
 The box provides all the columns of the selected table, having a checkbox for
each of the columns. 
 The user can select all the columns by checking the "Check All Columns"
checkbox. 
 Clicking on "Generate" button will create the query and will be displayed in the
textbox. 
 User can then copy this textbox by clicking on "Copy" button on the form.

- 112 -
SQL tuner
Future Enhancements
This project was developed to understand how the SQL Server Optimizer
optimizes the queries and reduces the query’s CPU time and Input\Output
required.

So there are many things for future enhancements of this project. The future
enhancements that are possible in the project are as follows:

 To optimize more complex queries i.e. queries which include Joins,


Unions, Sub Queries etc.
 To study the database structure and provide the user with suggestions to
improve the database structure for best performance.
 To optimize the query which is been embedded in the Application,
without the efforts of the user or programmer entering the query in the
SQL Tuner.

Optimizing more complex queries

SQL Tuner as of now tunes simple queries and doesn’t tune complex queries
which contains joins between two or more tables or sub queries i.e. query which
act as a where condition for another query or unions i.e. combination of two or
more queries, also insert queries, update queries or delete queries cannot be
tuned in the current version of SQL Tuner.

All the limitations can be implemented by using a more sophisticated parsing


methodologies and a more detailed study of how SQL Optimizer works in more
complex situations.

Optimizing Database Structure


SQL Tuner can be made capable of tuning databases i.e. it can provide
suggestions to improve database design by analyzing the databases and the
types of queries of that are fired on it. Database design is important with respect
to performance because bad logical database design results in bad physical
database design, and generally results in poor database performance.
This can be implemented by following some standard rules as specified below.

- 113 -
SQL tuner
 Following standard database normalization recommendations when
designing OLTP databases can greatly maximize a database's
performance.
 Consider denormalizing some of the tables in order to reduce the
number of required joins.
 If we are designing a database that potentially could be very large,
holding millions or billions of rows, we must try the option of
horizontally partitioning your large tables.
 To optimize SQL Server performance, we must design rows in such a way
as to maximize the number of rows that can fit into a single data page.
 TEXT, NTEXT, and IMAGE data should be stored separately from the
rest of the data in a table. The table itself (in the appropriate columns)
contains a 16-bit pointer that points to separate data pages that contain the
TEXT, NTEXT, or IMAGE data. This should be done to enhance
performance.

There are many other rules that can be implemented in SQL Tuner by using
procedure to analyze the database or table design.

Optimizing Queries Embedded in the Applications

In SQL Tuner as of now the users have to write the query within the interface
provided by SQL Tuner and only then it can be tuned or executed. But SQL
Tuner can be also be modified in such a way it searches for the queries in the
application provided to and automatically tune them, this may impose less load
on the user.

There is also other way in which it can be developed as a component by which


the developers can provide with the tuning facilities to the users of their
application.

All this is achievable by changing the interface and adding the functionality to
accept the applications source code files and traverse the code for the queries,
analyze the database and perform the respective changes. We can also prepare
an assembly which contains all the functions and properties and the developer
can use it in his application and improve the performance of the application.

- 114 -
SQL tuner
Bibliography
Websites
http://www.google.com
http://www.sql-server-performance.com
http://www.sqlservercentral.com
http://www.sqlite.org
http://www.transactsql.com
http://www.iAnywhere.com
http://www.blogs.msdn.com/queryoptteam
http://www.informit.com
http://www.dotnetbips.com
http://yannis@cs.wisc.edu

Books

Microsoft SQL Server 2000 Performance Optimization and Tuning Handbook.


- Ken England (Butterworth-Heinemann)

Microsoft T-SQL Performance Tuning


- Kevin Kline, Andrew Zanevsky, and Lee Gould
Applications and Database Management, Quest Software, Inc.

Query Optimization
- Yannis E. Ioannidis
University of Wisconsin

SQL Server Books Online


- Microsoft Corporation

Components Used

gudusoft.gsqlparser.dll

gudusoft.gsqlparser.yyrec.dll

- 115 -
SQL tuner

S-ar putea să vă placă și