Sunteți pe pagina 1din 4

15 SQL Server Performance Counters to

Monitor
How do you know your SQL Server system is running well? Is it the simple method of
noting that no one is complaining? (Sadly, in many businesses, this is exactly how it's
done.) The fact is, if you're not monitoring SQL Server performance, you just don't
know.

It's important to understand what the various SQL Server performance counters mean,
and fortunately there are a lot of resources for this. Let's take it one counter at a time,
starting with the counters common to all Windows servers.

Processor(_Total)\% Processor Time


Percent Processor Time tells us how busy the server's CPUs are. It's a basic indicator
to help us know that a server is running well within acceptable operating parameters.
Normally I'd expect to see this counter in the 20 to 40 percent range. When it jumps
above 80% I get very nervous, as that means that activity that requires the processor is
probably waiting for resources, and thus is slowing down someone's vital work.

Memory\Available MBytes
The Available MBytes Memory counter helps me know if server memory is an issue. I
can set Max Server Memory settings in SQL Server, which will help SQL Server share
the memory nicely with the Windows OS, but there may be other processes on the
server besides SQL Server. Capturing this counter allows me to know if other processes
are taking memory SQL Server needs to perform well.

Paging File(_Total)\% Usage


When Windows runs out of memory it takes large chunks of memory and swaps it out to
disk, to the Paging File. Unfortunately, the slowest operation in all computing is writing
to disk, regardless of the physical media involved, so swapping memory to disk is
naturally going to slow down the performance of your system. Keeping an eye on this
counter will help you know when you are encountering memory issues, and you can
then take action to resolve the conflicts.

PhysicalDisk(_Total)\Avg. Disk sec/Read


PhysicalDisk(_Total)\Avg. Disk sec/Write
These two counters tell you how quickly your I/O subsystem is responding to requests
for data from the operating system; in other words, latency. The latency values returned
are valid regardless of the type of I/O subsystem you're using, whether it's local physical
magnetic disk, SAN drives, NAS drives, or solid state drives. Your latency values should
normally not be more than 20ms; if you're using SSD, probably not more than 5ms. If
you see latency values of a second or more, your I/O subsystem has issues that need
to be addressed to keep performance at an acceptable level.

System\Processor Queue Length


The Processor Queue Length counter tells you the number of threads that are waiting
for time on the system processor. If this number is greater than 0, that means that there
are more requests per core than the system can handle, and this can be a cause for
significant performance issues. I once had a client that had a month-end process that
had to be run during the business day, which would take 2.5 to 3 hours to run; when it
ran, performance for everyone else on that system would be horribly slow. I looked at
the Processor Queue Length counter - normally it would get to no higher than 3 or 4
during the day, but During month-end it jumped to somewhere between 30 and 50. The
client was running on a virtual machine with 4 processors, and I asked if they could
double that. They did, and the next month-end completed in 45 minutes.

Network interface\Bytes total/sec


This is another baseline indicator that tells you the rate at which the network adapter is
processing data. This number should remain lower than 60% of the maximum
sustainable bandwidth of the network.

In addition to the operating system counters, you'll also want to monitor the SQL Server
performance counters. If you're observing a default instance the first part of the counter
name (to the left of the colon) is SQLServer. If it's a named instance, like INST01, it's
MSSQL$ followed by the instance name, so in this case, MSSQL$INST01.

SQLServer:Access Methods\Forwarded Records/sec


Forwarded Records/sec helps you understand how fragmented your heaps are. A heap
is a SQL Server table without a clustered index, and SQL Server uses Row IDs to find
the data it's looking for. If it arrives at the page where the data should be, and that data
has been moved, SQL Server leaves a pointer there called a forwarding pointer, and the
process has to incur an additional I/O operation to get it from the new location. Each
time a search for data encounters a forwarding pointer it increments this counter. This
may be unavoidable, but if you track this over time and this number starts to increase,
you should think about ways to defragment your table (or stop using a heap).

SQLServer:Access Methods\Page Splits/sec


Page Splits/sec also helps you understand how fragmented your tables are. In this
case, even if your table is in good shape, when SQL Server adds pages to it, it'll
increment this counter. If, however, SQL Server needs to insert a row onto a page, and
there isn't room, SQL Server will split the page into multiple pages, move rows from one
page to another to balance the pages out, and then insert the row. This is a very
expensive and time-consuming process, and this counter will help you understand when
this is happening a lot. Properly configuring the free space on each page will help
minimize this activity, just note that there are "good" page splits and "bad" page splits,
and this counter doesn't differentiate (Jonathan Kehayias of SQLskills has an Extended
Events session you can use instead.)

SQLServer:Buffer Manager\Buffer cache hit ratio


On older SQL Server systems (think SQL Server 2000 and earlier) this counter would
help you understand the percentage of times the needed page was already in the buffer
pool when it was needed. On today's system it's mostly useless, but many older DBA
managers still ask for it, so it doesn't hurt to capture this counter.

SQLServer:Buffer Manager\Page life expectancy


Page Life Expectancy is an often mis-interpreted counter. Jonathan Kehayias wrote a
blog post called "Finding what queries in the plan cache use a specific index." Jump
down to the paragraph entitled "What’s Wrong about Page Life Expectancy >= 300?"
and read from there to the end of the post. In the SQL Sentry tool, we use Jonathan's
guidelines for indicating what the correct expectation should be. The thing I use this
counter for is to see when some process is causing the pages in the buffer pool to be
flushed out, and getting to the root of that problem. It may be that person that insists on
running the report that has to read all 30 million customer records during the business
day, which may require some behavior modification. It could also be that you just ran
DBCC CHECKDB, because that brings that counter's value to 0 as well. The important
thing is to know your workload and look for those events that are outside normal
expectations.
SQLServer:General Statistics\Processes blocked
In any multi-user application you're going to have blocked processes, and SQL Server
has mechanisms to handle blocked processes well, but when this counter goes outside
the normal range (for your system) you'll want to investigate and see what might be
causing the issue. There could be excessive blocking due to page escalation, for
example, where entire tables are getting locked instead of individual rows or pages.

SQLServer:SQL Statistics\Batch Requests/sec


This counter is there to help you understand how busy your SQL Server system is. By
capturing this counter, and using it in your baseline, you can identify variances easily -
they might be reported by users, or it might just be extra load on the system because
people are asking for more than they usually do.

SQLServer:SQL Statistics\SQL Compilations/sec


SQLServer:SQL Statistics\SQL Re-Compilations/sec
These counters will increment when SQL Server has to compile or recompile query
plans because either the plan in cache is no longer valid, or there's no plan in cache for
this query. SQL Server uses a cost-based optimizer that relies on statistics to choose a
good query plan, and when those statistics are out-of-date, additional compilations are
done unnecessarily. It can be useful to understand the source of this problem, if it is a
problem (this might be expected behavior, depending on the workload).

S-ar putea să vă placă și