Documente Academic
Documente Profesional
Documente Cultură
Home
TroubleshootingHighCPUutilizationissuesinExchange2013ExchangeTeamBlogSiteHomeTechNetBlogs
Library
Forums
Wiki
TechCenter
112
Tw eet
71
Introduction
Like
In Exchange support we see a wide range of support issues. Few of them can be more difficult to troubleshoot than performance issues. Part of the reason for
that is the ambiguity of the term "Performance Issue". This can manifest itself like anything from random client disconnects to database failovers or slow mobile
device syncing. One of the most common performance issues we see are ones where the CPU is running higher than expected. "High CPU" can also be a bit of
an ambiguous term as well. What exactly is high? How long does it occur? When does it occur? All of these are questions that have to be answered before you
can really start getting to the cause of the issue. For example, say you consider high to be 75% of CPU utilization during the day. Are you experiencing a
problem, are databases inadequately balanced, or is the server just undersized? What about a 100% CPU condition? Does it happen for 10 seconds at a time
or 10 minutes at a time? Does it only happen when clients first logon in the morning or after a failover? In this article I'll go into some common causes of high
CPU utilization issues in Exchange 2013 and how to troubleshoot them.
At this point I should note that this article is about Exchange 2013 specifically, not earlier versions. High CPU issues across versions do have some things in
common, however much of the data in this article is specific to Exchange 2013. There are some fairly significant differences between Exchange 2010 and
Exchange 2013 that change the best practices and troubleshooting methodology. Some of these include completely different megacycle requirements, different
versions of the .NET Framework, and different implementation of .NET Garbage Collection. Therefore, I will not be covering Exchange 2010 in this post.
http://blogs.technet.com/b/exchange/archive/2015/04/30/troubleshootinghighcpuutilizationissuesinexchange2013.aspx
1/5
10/12/2015
TroubleshootingHighCPUutilizationissuesinExchange2013ExchangeTeamBlogSiteHomeTechNetBlogs
Health Checker
We've recently published a PowerShell script on the TechNet gallery that makes checking for common configuration issues easy. The script reports
Hardware/Processor information, NIC settings, Power plan, Pagefile settings, .NET Framework version, and some other items. It also has a Client Access load
balancing check current connections per server and a Mailbox Report active/passive database and mailbox total per server. It can be executed remotely and
can run against all servers in the Organization at once, to save the trouble of having to check all of these settings individually on each server. The TechNet
gallery posting contains more details on the script as well as some common usage syntax.
Sizing
After we've ruled out the common causes from the previous section, we now have to move on to sizing. Perhaps the CPU is running high because the server
doesn't have enough megacycles to keep up with the load being placed on it. Sizing Exchange 2013 is covered in multiple blog posts.. If you want a good
understanding of sizing, I suggest reading Jeff Mealiffes post Ask the Perf Guy: Sizing Exchange 2013 Deployments. If you haven't done it already, you should
also run through Ross Smith IV's sizing calculator. Most deployments have utilized the calculator for planning and sizing. I'm a support guy so I'm approaching
this topic from the angle of troubleshooting an existing environment. In the world of troubleshooting we dont need to size and plan a deployment, but we do
need to know enough about it to know if a performance problem is simply an issue of being undersized. Troubleshooting a high CPU issue with no knowledge
of sizing can at best be difficult and many times just not possible. When it comes to CPU sizing it comes down to this question do I have enough available
megacycles to handle the load?
Easy enough right? Not quite. How many available megacycles you have is fairly straight forward, although it does require a bit of math. The basic formula
taken directly from Jeff's sizing blog is as follows:
Two of these numbers are already known. The MHz percore of the baseline platform is always 2000, and the Baseline percore score value is always 33.75.
Again, this is specific to Exchange 2013 only. All you need now is your target platforms percore score value. This value is the SPECInt 2006 rating of your
server divided by the total number of physical cores. If you don't want to use the website you can look up your server's rating with the Exchange Processor
Query Tool. Say our SPECInt 2006 rating on a 12 core server is 430, giving us a percore rating of 35.83 430/12. The formula now looks like this:
2123.26 megacycles percore, times 12 cores, gives you 25,479 total megacycles available. Now we have to find out the required megacycles. This is a bit more
complicated. It depends on the number of active and passive mailboxes you have along with message profile messages sent/received per day and any
multipliers that may be required by 3rd party products. Luckily, there is a script to help with this as well.
The Exchange 2013 CPU Sizing Checker will run these numbers for you. You can pass in all of the profile information but it is easier to just import the values
directly from your sizing calculator results. Syntax can be found on the download page.
Screenshot of the Sizing Checker:
http://blogs.technet.com/b/exchange/archive/2015/04/30/troubleshootinghighcpuutilizationissuesinexchange2013.aspx
2/5
10/12/2015
TroubleshootingHighCPUutilizationissuesinExchange2013ExchangeTeamBlogSiteHomeTechNetBlogs
Version 7.2 of the Sizing Calculator also allows us to get an idea of the expected CPU utilization. The difference is it will calculate expected CPU utilization
based on the number of active and passive mailboxes planned by taking the values from the Input page of the spreadsheet as opposed to querying the
mailbox server for a current total. The new features in version 7.2 provide insight that lets you know what to expect from a CPU utilization standpoint in
multiple different scenarios that include Normal Runtime no failures, evenly distributed databases, Single Failure a single server in the datacenter has failed,
resulting in database copy activation, Double Failure two servers in the datacenter have failed, resulting in database copy activation, Site Failure a datacenter
has failed, requiring failover to another datacenter, and Worst Failure worst possible failure based on design requirements for the environment.
Message Profile and Multiplier
By now you're probably saying "this is nice, but how do I know my message profile and multiplier numbers?" Great question. The message profile numbers on
a live production deployment can actually be determined by yet another great script from Dan Sheehan called GenerateMessageProfiles.ps1, available on
TechNet Gallery. This script will parse your transport logs and give you an actual number of messages sent/received per day. In addition to publishing the
script, Dan has written a blog post that explains the script and its usage in detail.
That works for message profiles. What about the multiplier? This is the tough one. Some 3rd party vendors will actually give you a suggested multiplier for
their software. Sometimes this information is not available. In this case you can use the previously referenced Exchange 2013 CPU Sizing Checker script to
reverse engineer the multiplier. Let's say you run the script with a multiplier of 1.0. It gives you a CPU number of 50% which is the average CPU usage you can
expect from the Exchange specific processes during the busiest hours of the day. You, however, are seeing a value closer to 65%. You can run the script again,
modifying the multiplier, until you get a result close to 65%. Once you do, that can give you an idea of what multiplier number you should be using in your
sizing plans.
As previously mentioned, version 7.2 of the sizing calculator has the ability to predict CPU values based on your planned deployment numbers. This means that
you can modify the Megacycles Multiplication Factor in the profile settings on the calculators Input tab and view the results in the CPU Utilization/Dag
section on the Role Requirements tab to get an idea of which multiplier value suits your deployment best. In most cases this is preferable to using the script as
the calculator is faster and designed around helping you plan your deployment as opposed to the script which is more for troubleshooting.
Oversizing
Contrary to what you may think, it is possible to oversize your servers from a CPU standpoint. This doesn't come down to raw processing power. It might be
inefficient use of hardware in some cases to deploy on servers with high core counts, but too much processing power isn't the problem. When I talk about
oversizing I'm not really talking about the available megacycles more than I am the number of cores. Exchange 2013 was developed to run on commodity type
servers. Testing is generally done on servers with processor specifications of 2 sockets and about 1620 cores. This means that if you deploy on servers with a
much larger core count you may run into scalability issues. Core count is used to determine settings at the application level that can make a difference in
performance. For example, in processes that use Server mode Garbage Collection we will create one managed heap per core you can read in detail about
Garbage Collection in .NET 4.5 here. This can significantly increase the memory footprint of the process and it goes up the more cores you have. We also use
core count to determine the minimum number of threads in the threadpool of many of our processes. The default is 9 per core. If you have a 32 core server,
that's 288 threads. If, for example, there is a sudden burst of activity you could have a lot of threads trying to do work concurrently. Some of the locking
mechanisms for thread safety in Exchange 2013 were not designed to work as efficiently in high core count scenarios as they do in the recommended core
count range. This means that under certain conditions, having too many cores can actually lead to a high CPU condition. HyperThreading can also have an
effect here since a 16 core HyperThreaded server will appear to Exchange as having 32 cores. This is one of the multiple reasons why we recommend leaving
HyperThreading disabled. These are just a few examples but they show that staying within the recommendations made by the product group when it comes to
server sizing is extremely important. Scaling out rather than up is better from a cost standpoint, a high availability standpoint, and from a product design
standpoint.
http://blogs.technet.com/b/exchange/archive/2015/04/30/troubleshootinghighcpuutilizationissuesinexchange2013.aspx
3/5
10/12/2015
TroubleshootingHighCPUutilizationissuesinExchange2013ExchangeTeamBlogSiteHomeTechNetBlogs
processor time of all processes including the Idle process, which means it will almost always be near the maximum value. If you are looking at a perfmon
capture and don't know the total number of cores, just look at the highest number in the instances window under the Processor counter. It is a zero based
collection, each number representing a core. If 23 is the highest number, you have 24 cores.
Now that you know that there was a high CPU condition and when it occurred, we can start narrowing down what caused it. The next thing to do is load all
instances under "Process\% Processor Time". You can ignore "_Total" and "Idle". During this phase of troubleshooting it may be best to change the vertical
scale of the perfmon window. To do this right click in the window, properties, graph tab, change the maximum to core count x 100. In our 16 core example you
would change it to 1600. Look for any specific process that takes up more CPU than the others and goes up in tandem with the overall CPU utilization. If there
isn't one in particular, you don't have a single process causing the issue. This tends to point to some of topics covered in the previous sections such as sizing,
load, and CPU throttling.
Mapping w3wp instances to application pools
Let's say you do find one particular process that is causing the high CPU condition. Suppose that the process has the name "w3wp#1". What exactly are you
supposed to do with that? Exchange runs multiple application pools in IIS for the various protocols it supports. We need to find out which application pool
"w3wp#1" maps to. Luckily perfmon has the information we need, you just need to know how to find it.
The first thing you want to do is load the counter "Processw3wp#1\ID Process". This will give you the process ID PID of that w3wp instance. Let's say it's
22480. With that information we go back to the counter load screen and look under "W3SVC_W3WP". Click on any of the counters. Below you will see a window
that contains entries with the format PID_AppPool. In our example it says 22480_MSExchangeSyncAppPool. That tells us that w3wp#1 belongs to the Exchange
ActiveSync application pool. Now we know that ActiveSync is the cause of our high CPU. At this point you can remove all of the counters from your view except
for "Processw3wp#1\% Processor Time" as the extra clutter is no longer needed. You may also want to set the vertical scale back to 100 and right click on the
counter and choose "Scale Selected Counters".
I should also note here that due to managed availability health checks, sometimes an application pool is restarted. When this happens the PID and the w3wp
instance may change. Pay attention to the Processw3wp*\ID Process counter for the worker process you are interested in. If this value changes that means
the process was recycled, the PID changed, and perhaps the w3wp instance as well. You will need to verify if the instance changed after the process recycled to
make sure you are still looking at the right information.
What is the process doing?
Now that we've narrowed it down to w3wp#1 and know that ActiveSync is the cause of our issue, we can start to dig into troubleshooting it specifically. These
methods can be used on multiple other application pools but this example will be specific to ActiveSync. The most common thing to look for is burst in activity.
We can load up the counter "MSExchangeActiveSync\Requests /sec" to see if there was an increase in requests around the time of the problem. Whether there
was or was not, we now know if increased request traffic led to the CPU increase. If it did, we need to find the cause of the traffic. It's a good idea to check the
counter "MSExchange IS Mailbox_Total\Messages Delivered /sec". If this ticks up right before the CPU increase, it tells you that there was a burst of incoming
messages that likely triggered it. You can then review the transport logs for clues. If it wasn't message delivery it may have been some mobile device activity
that caused it. In this case you can use Log Parser Studio to analyze the IIS logs for trends in ActiveSync traffic.
Garbage Collection (GC)
If there was no noticeable increase in request traffic or message delivery before the increase, there may be something inside the process causing it. Garbage
collection is a common trigger. You can look at ".NET CLR Memoryw3wp#1\% Time in Garbage Collection". If it sustains higher than 10% during the issue it
could trigger high CPU. If this is the case also look at ".NET CLR Memoryw3wp#1\Allocated Bytes /sec". If this counter sustains about 50,000,000 during the
high CPU condition and is coupled with an increase in "% Time in Garbage Collection", it means the Garbage Collector may not be able to keep up with the
load being placed on it. I want to note very clearly here that if you encounter this, Garbage Collection throughput usually isn't the root of the problem. It is
another symptom. Increases of this type usually indicate abnormal load is being placed on the system. It is much better to find the root cause of this and
eliminate it rather than to start changing the garbage collector settings to compensate.
RPC Operations/sec
This is perhaps the best counter we have in mapping client activity to high CPU. You can load up "MSExchangeIS Client Type*\RPC Operations /sec" to get an
idea of how many RPC requests are being issued against the Information Store by client type. Usually the highest offenders will be momt Requests from the
RPC Client Access Service, usually Outlook MAPI clients, contentindexing, webservices EWS, and transport mail delivery. You really need to have a baseline
of your environment to know what "normal" is but you can definitely use this counter to compare to the overall CPU utilization to see if client requests are
causing a CPU utilization increase.
Log Parser Studio (LPS)
If I were stuck on a desert island and had to troubleshoot Exchange performance issues for food, and could only bring two tools, they would be perfmon and
Log Parser Studio. LPS contains several built in queries to help you easily analyze traffic for the various protocols used by Exchange. You can use it to get a
view of the most ActiveSync hits per day by device, EWS requests by client type, RPC Client Access MAPI client version by percentage, and many others. The
built in queries are great for just about anything you'd need to find out. If you need more and know a bit of TSQL, you can even write your own. LPS is covered
in depth in Kary Wall's blog post. If you get to the point where you have the client type causing your issue narrowed down, LPS is usually the next step.
Conclusion
Performance is a vast topic and I don't expect this blog post will make you an expert immediately, but hopefully it has given you enough tips and tricks to start
tracking down Exchange 2013 high CPU issues on your own. If there are other topics you would like to see us blog about in the realm of Exchange
performance please leave feedback below. Happy troubleshooting!
Marc Nivens
Comments
Petri X #
http://blogs.technet.com/b/exchange/archive/2015/04/30/troubleshootinghighcpuutilizationissuesinexchange2013.aspx
4/5
10/12/2015
TroubleshootingHighCPUutilizationissuesinExchange2013ExchangeTeamBlogSiteHomeTechNetBlogs
What about "System\Processor Queue Length", isn't that one of indicating if the CPU is the bottleneck? Also how all of these counters are acting when the
server is in virtual environment and you for example sharing the CPU with multiple other guests. Can we see when the oversubscription is causing the issue?
I know what is the best practices, but it does not decrease my interests :D
VM performance can be very tough to gauge using perfmon counters on the guest system. Perfmon data is useful on a guest for many things but not for
things like CPU throttling or sharing/oversubscribing. Some of that information just will not show up at the guest layer. For example, even if the CPU power
is being throttled the "Processor Information_Total\% of Maximum Frequency" counter will show it as 100% on the guest all of the time. The only way to
accurately tell if oversubscribing or CPU sharing is the cause of your issue is to use performance data from the hypervisor host. What that data is and how
to interpret it will depend on the type of hypervisor you are running.
Siddarth Laxminarayanen #
Does Forefront affect CPU Usage in Exchange?
Perf Counters within a virtualized guest are more an "indicator" for "something's going on". But it helps to correlate information from the guest OS with
performance data of the hypervisor platform.
Exchange TechNet
Resources
Exchange TechCenter
Exchange Server 2010
Exchange Server 2007
TechNet Library
Forums
Quick Links
Support
Exchange Development
Blog
Buy Now
MSExchange.org
Exchange Online
MSExchangeGuru's Blog
Ask Perry
More...
Terms of Use
Trademarks
Privacy Statement
http://blogs.technet.com/b/exchange/archive/2015/04/30/troubleshootinghighcpuutilizationissuesinexchange2013.aspx
Report Abuse
5/5