Sunteți pe pagina 1din 15

Microsoft

Operations
Manager 2005
Operations Guide

Optimize
Author: Dan Wesley
Program Manager: Guatam Bhatia
Published: December 2004
Applies To: Microsoft Operations Manager 2005
Document Version: Release 1.0
The information contained in this document represents the current view of Microsoft Corporation
on the issues discussed as of the date of publication. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the
date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting
the rights under copyright, no part of this document may be reproduced, stored in or introduced
into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written
permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail
addresses, logos, people, places, and events depicted herein are fictitious, and no association
with any real company, organization, product, domain name, e-mail address, logo, person, place,
or event is intended or should be inferred.
 2004 Microsoft Corporation. All rights reserved.
Microsoft, MS-DOS, Windows, Windows NT, Windows Server, Active Directory, ActiveSync, and
Windows Mobile are either registered trademarks or trademarks of Microsoft Corporation in the
United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their
respective owners.

Acknowledgments
Primary Reviewers: Adam Stone, Gautam Bhatia, James Hedrick
Managing Editor: Sandra Faucett

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Optimize

C H A P T E R 6
This chapter provides guidance for analyzing Microsoft® Operations Manager 2005  (MOM)
performance, and for identifying potential and existing performance issues.
In This Chapter
• Introduction
• Characteristics of an Optimized MOM System
• Sudden Increases in Resource Usage
• General Indicators of a Performance Issue
• Lower the Risk of Performance Issues
• Assess MOM Database and Management Server Activity
• Assess Agent Activity
• Assess Console Activity

Introduction
Consider optimizing and tuning MOM to:
• Reduce or eliminate performance bottlenecks that are reducing overall effectiveness or
causing a system failure.
• Improve performance to increase operational effectiveness or to scale up to manage more
computers.
When you are dealing with performance issues, or you want to tune your system, it is
recommended that you:

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
6 Chapter 6 Microsoft Operations Manager 2005 Operations Guide

• Clearly define goals and objectives for optimizing MOM, and confirm that the results of
optimizations can be quantified.
• Review capacity planning and sizing documents to ensure that computers are appropriately
sized for each component in the MOM system (for example, the a server hosting the MOM
Database).
• Review the existing MOM architecture to ensure that it is adequate for supporting the
number of computers that you are managing, and that the load on architectural components
falls within supported limits (for example the number of agents reporting to a Management
Server).
• Confirm that the MOM support team is aware of known hardware and networking issues that
may have existed before MOM was installed.
• Confirm that the MOM support team is familiar with the pre-defined performance thresholds
provided in the Management Packs that are installed.
• Make sure that historical data is available in, order identify trends and develop the
appropriate performance benchmarks. See also: Benchmarks.
• Take a systematic approach to identify performance issues and implement changes.
• Start with a system-wide view and identify key performance indicators.
• Isolate the part of the system that you want to focus on, and identify the appropriate
performance indicators.
• Implement changes, and monitor the system, to verify the results of the changes by
collecting historical data.
• Have a back out plan for reversing configuration changes that may create new
bottlenecks or further degrade performance.

Required Knowledge and Skills


It is recommended that staff members responsible for performance monitoring and analysis have
a working knowledge of the Microsoft Performance Monitor, Microsoft Network Monitor, and
SQL Profiler.
In all cases, they should know what the key performance indicators are, and be able to interpret
the results of the data that is collected.
Trends
One of the major techniques for analyzing performance is trend analysis. Support staff should be
able to identify trends in performance on all parts of the MOM system.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Introduction 7

Benchmarks
Although Management Packs provide rules to generate alerts when certain thresholds are
exceeded, it is necessary to determine if the general performance of a computer is constant, or
changing, so potential issues can be addressed proactively.
Performance benchmarks, based on real time and historical data, are essential for determining
whether or not performance on a MOM component has changed. For example, unless there is a
baseline, you cannot conclude that Management Server performance is “slower” or “faster”.

Characteristics of an
Optimized MOM System
Overall MOM performance is affected by many variables, and the impact of each variable varies
from organization to organization. For example, networks are a common element in most
organizations, but the extent of networking, network capacity and network reliability differs in
every organization.
In general, an optimized MOM system has the following characteristics:
• The operational database size remains below 15 GB.
• Only the Management Packs that are needed are installed, and only the rules that are
required are enabled.
• The volume of events and alerts that are generated does not overload the Management
Server or the MOM database.
• The volume of operational data from managed computers is evenly distributed among
Management Servers (in management groups that have more than one MOM server).
• Communication between MOM components is efficient and consistent, and alert latency
remains below 2 minutes, during normal operations.
• Database re-indexing and grooming jobs complete successfully, and in a timely manner.
• The data transformation services (DTS) job that is run against the operational database
completes successfully.
• Ongoing operations management tasks, such as computer and attribute discovery, agent
installs/uninstalls, and configuration changes complete successfully and do not overload the
Management Servers for an extended period of time.
• Resource utilization on all the servers, including the managed computers, stays within
acceptable ranges during normal operations.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
8 Chapter 6 Microsoft Operations Manager 2005 Operations Guide

Sudden Increases in
Resource Usage
It is normal for sudden increases (spikes) in resource usage to occur on all the MOM components
under certain conditions. These spikes are caused by different activities that take place during
normal MOM operations, and can they can create temporary bottlenecks that increase alert
latency. Any optimizing activity should factor in these increases in resource usage. It is
recommended that you:
• Distinguish spikes in resource usage from ongoing resource utilization issues.
• Use performance counters and MOM reports to identify the cause and frequency of the
spikes.
Typically, surges in resource usage are not considered to be an ongoing performance issue, and
you can implement processes to minimize the impact of performance spikes as part of your
optimizing activities.

Known Causes of Increased Resource Usage in MOM


There are several situations that will cause a sudden increase in resource usage on the MOM
system.
Performance data bursts
Most management packs collect performance data for 15 minutes, and all of the agents send this
data in bursts. In a management group with a large number of agents, these data transmission
bursts saturate the database server disk and back up the Management Server queue until all of the
performance data is inserted, which temporarily increases alert latency.
Service discovery data from all the agents
CPU utilization can increase noticeably when a large number of agents simultaneously send
service discovery data to the Management Server. Depending on the volume of the data, the
server queue can fill, which contributes to alert latency.
This situation usually happens when:
• Service discovery is run after a new Management Pack is installed and targeted to a large
number of agents.
• The service discovery script is run after there has been a change in some service discovery
instance, or attribute, for a large number of agents.
SQL jobs

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Introduction 9

The re-index job, which runs every Sunday at 3 A.M., causes the database server disk to be
heavily utilized. This can cause some alert latency for the duration that it runs, which is usually
20-30 minutes.

Note
Other jobs, including grooming, update database, as well as
the Data Transformation Services (DTS) job, do not contribute
significantly to alert latency.

General Indicators of a
Performance Issue
There are several key indicators that provide advance warning of potential performance issues.
Perform the following checks daily:
• The size of the MOM Database:
• Is increasing rapidly or filling quickly.
• Is increasing more than anticipated.
• MOM Database re-indexing and grooming jobs are taking longer.
• Performance on the database server or Management Server is consistently slow.
• There are alerts indicating that the Management Server queue is full.
• Resource utilization (for example, CPU or memory) is unusually high on any computer that
has a MOM component installed.
• Alert and event latency is higher than normal, or is increasing.
• Response time on the consoles is slow.

Lower the Risk of


Performance Issues
There are steps that you can take to lower the risk of performance issues.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
10 Chapter 6 Microsoft Operations Manager 2005 Operations Guide

IT architecture and network topology


You can deploy MOM to monitor agents within a single management group. Additional
management groups can be added to accommodate a growing organization, scale across
geographic locations, and manage performance. Eventually, your organization might require
additional management groups, for MOM to run optimally. For information about how to deploy
new management groups, see the Microsoft Operations Manager 2005 Deployment Guide.
Typically, all MOM components run within a management group. However, there are several
reasons why multiple management groups within the enterprise might benefit your organization:
• Monitoring across a firewall
If you have many agents spread across a firewall, it might be more efficient for you to add an
additional management group across the firewall to manage these agents. Alerts can be
forwarded to a management group outside the firewall.
• Slow links
If you have many agents spread across a slow link, you might be able to optimize MOM
performance and network performance by installing an additional management group, on the
other side of the slow link, to manage these agents. Alerts can be forwarded to a
management group on the other side of the slow link.
• Geographic location
If your enterprise consists of multiple physical locations, it might be more effective for
administrators in each location to keep their data local.
• Network bandwidth
If your enterprise consists of wide-area links, the most effective use of your network
bandwidth is to keep the majority of the traffic local, by configuring your Management
Servers to forward alerts, and their associated events, to the destination management group.

Load distribution and failover


Load distribution can be achieved by adding multiple Management Servers to a management
group, and configuring them to failover in case a Management Server fails.
In a configuration with two Management Servers, the apparent limit would be 4000 managed
computers, distributed between the two servers. However, this is not the case.
• If failover is configured, the apparent limit in this scenario is not accurate because when one
server fails, the total load on the remaining server would be 4000 managed computers. This
is double the maximum supported limit for a Management Server. The maximum number of
managed computers for a two-server management group should be 2000, distributed evenly
between the two servers.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Introduction 11

• As a best practice, do not load a Management Server to the maximum supported limit for
managed computers.

Database Free Space


You should maintain 40 percent free space in your database. If you do not, certain maintenance
jobs, such as re-indexing, will fail. It is essential that you continually monitor your database for
free space to maintain a healthy database. MOM provides a rule for monitoring free space, and
will generate an alert when this threshold is exceeded. Because the database can unexpectedly
grow quickly, it is recommended that you monitor the free space closely.

Database is set to grow automatically


You should not configure the OnePoint database for automatic file growth. During the automatic
file growth process, all database operations are suspended. If a database operation, requiring
uninterrupted access to the database, attempts to write to the database during automatic file
growth, the operation will fail. This condition can cause the database to prevent MOM from
functioning properly. It is important to continually monitor your database size.

Too many Management Packs are installed


You should only install the Management Packs that you plan to use, initially, and install others
later, if they are needed. This approach serves two purposes:
• Unnecessary data is not transmitted and stored.
• The agent memory footprint is smaller. For example, the MOM agent footprint (idle) is 3.5
MB, when managing Exchange Server, the footprint increases to 18 MB. Because the rules
for each Management Pack are held in memory on a managed computer, each Management
Pack that is installed increases the footprint.

Management Pack Tuning


Microsoft’s Management Packs for MOM 2005 are designed to reduce the generation of
unnecessary alerts. However, situations still occur in which certain rules generate a high volume
of alerts. Management Pack tuning is not necessarily a complex and time consuming exercise.
Management Pack developers at Microsoft estimate that 90% of the time, reconfiguring 4-6 rules
resolves 95% of the unnecessary data traffic from the agents to the Management Server.
Identify the rules that are responsible for the most traffic, and focus on them.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
12 Chapter 6 Microsoft Operations Manager 2005 Operations Guide

Note
The Alert Tuning Solution documented in Chapter 8, “Tools”,
provides extensive guidance for tuning Management Packs
that could be appropriate for this task.

Audit log is enabled


Audit logging is used to track configuration and rule changes. By default, it is not enabled. If
audit logging is enabled for troubleshooting purposes, you should disable it immediately because
it can be CPU-intensive and it consumes a large amount of space on the database. MOM has no
grooming job to groom the Auditing tables, AuditLogValues, or AuditLog. Even if you use very
aggressive grooming rates on the other data, the database might never be large enough to store
the additional data. If you must have auditing enabled, you can use the following procedure to
delete data from the Auditing tables, periodically, to free up space in the database.

Assess MOM Database and


Management Server Activity
If there is a general alert latency problem, notifications that the server queue is full, or slow
performance on either the database or Management Server, conduct the following assessment.

Verify that the OnePoint database is available


Check the following on the database server:
• The Microsoft SQL Server™ instance for the Onepoint database is started.
• Look in the Application log for any database failures, such as:
• The OnePoint database is full.
• The OnePoint log is full.
• The tempDB is full.

Verify that the Management Server is functioning normally


Check the following on the Management Server:
• Look in the Application log to see if there are any errors that indicate that the Management
Server is not able to contact the database. Confirm that:
• The Management Server can communicate with the database server.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Introduction 13

• There are no permission issues between the Data Access Server (DAS) and the database.
• See if the MOMService process is restarting frequently. This process restarts if the private
bytes exceed 300 MB on the server. This can occur when the number of agents approaches
the supported limit, or if there are several Management Packs installed. This limit can be
changed by changing the value for the
HKEY_LOCAL_MACHINE\SOFTWARE\Mission Critical
Software\OnePoint\MaxServerPrivateBytes registry key.
• See if the server queue is filling up frequently. If so, conduct the analysis in “Server queue
assessment”.
• See if other applications are consuming too many resources on the Management Server, and
the SQL instance for the OnePoint database on the database server.

Server queue assessment


Check the following indicators to assess the server queue.
High CPU utilization on the database server
If CPU utilization is above 80% on the database server:
• Check if the performance counter \MOM Server(*)\DB Disc Simple Count is greater than
zero for more than a few minutes. If it is, this indicates that service data was received from a
large number of agents.
• Check your service discovery rules to see what scripts were synchronized to run recently, to
see if there is a valid reason for discovery data to change on all of the agents.

Note
If discovery data is filling the queue, this situation should
resolve itself within 1-4 hours. The length of time depends on
the number of agents on which data was changed, and what
Management Pack the data changed for. For example, the IIS
service discovery packet is larger that the Windows Base
Operating System service discovery packet.

• If the discovery simple count is 0, but the performance counter \MOM Server(*)\DB Alert
Simple Count is greater than zero for more than a few minutes, , the system may be
experiencing an alert storm. Use the Operator console to view the alerts are coming in, to see
if the number of alerts is much higher than usual.
• Check to see if there are several Operator consoles in the management group that are
currently refreshing view. If this is the case, check to see if any queries are taking longer
than expected to return data to the consoles.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
14 Chapter 6 Microsoft Operations Manager 2005 Operations Guide

• If the Reporting database is installed on the same computer as the operational database,
check to see if the reporting DTS job is running. If it is, investigate to see if the job is taking
longer that usual to complete.
• Check to see if there are other SQL Server jobs running at the time that CPU usage is high.
• If you still haven’t identified the issue, use the SQL Profiler to see what queries are running
either with high CPU usage, or for a long time.
Activity on the disk where the OnePoint database resides is high
If disk idle time is less than 20%:
• Check the size of the current SampledNumericData partition. Do this using Enterprise
Manager. Check the table name, where Current=1 in PartitionTables table, and then check
the size of that SND table. If the table is greater than 5 million for every 1 GB of memory
that SQL can use, check to see if it recovers soon after the next partitioning job.
If you see this pattern every night, you may have to add more memory to the database server,
and ensure that SQL Server is using the extra memory. Another option is to reduce your
performance data load.
• Repeat the preceding process with the current Event partition. If this is the problem area,
then you may have to add more memory to the database server and ensure that SQL Server
is using the extra memory. Another option is to reduce your performance data load.
• Check to see if there are any SQL jobs, in particular the Re-index job, running at the time
that disk activity is high.
• Check to see if the reporting DTS job is running. If it is, see if it is running for longer than
usual.
• If you still haven’t identified the issue, use the SQL Profiler to see what queries are running
with high disk activity.
High CPU utilization by the MOMService process
If the MOMService process CPU usage is over 80% on the Management Server:
• Repeat the steps used to check the discovery simple count (“High CPU utilization on the
database server”).
Activity on the Management Server disk is high
If the idle time is less than 20% on the Management Server:
• Make sure MOM 2005 RTM is installed. MOM 2005 RC had an issue with disk utilization
on the Management Server.
Server queue filling up from time to time

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Introduction 15

If resource consumption is not high, but the server queue is filling up from time to time, check
the following patterns:
• If the server queue is filling up every 15 minutes, it could be the performance counter
collection. Check the database disk idle time to see if there is a corresponding spike to the
queue filling up. If this is the case, there is disk bottleneck and either faster or additional
disks are required.
• If the server queue is filling up towards the end of the time, and it recovers at midnight, there
is probably a high volume of performance data or events. There should be corresponding
high disk activity on the database server.
• If the pattern for the server queue filling up is periodic, look for SQL jobs running at those
times.
• If the server queue is constantly at 100%, see if the server queue simple count is constant. If
so, make sure MOM 2005 RTM is installed. MOM 2005 RC had an issue with the server
queue getting deadlocked.
• If there is no pattern for the server queue filling up, enable tracing and check mc8 logs on the
server. Look for errors that correspond to the queue filling up.
If there are no resource bottlenecks on the database and Management Servers, and the \MOM
Server(*)\Queue Space Percent Used never exceeds 10% for more than a few minutes, but alert
latency is still high, then the cause of the latency is likely the agent.

Assess Agent Activity


You need to assess activity on a specific agent when:
• There is high resource utilization on the agent.
• Alert latency for alerts from the agent is high, but the server queue is empty.
If the preceding conditions exist, you need to check:
• Events in the Windows Application log.
• Resource usage.
• Agent queue issues.

Events in the Windows Application log


Check the agent’s application log for the following events.
The agent cannot contact the Management Server

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
16 Chapter 6 Microsoft Operations Manager 2005 Operations Guide

• Check communications from the agent to the Management Server by: pinging the server, and
by using telnet to connect to port 1270 on the Management Server.
• Check the network bytes/second and bandwidth on the server and agent computers. Verify
that all of the available bandwidth is not being used, especially in low-bandwidth scenarios.
• If none of the preceding cases are true, turn on tracing, and check the agent’s mc8 logs for
error events indicating that the agent cannot connect to the Management Server.
The agent service is restarting
Check the MOMHost private bytes performance counter. The MOMService restarts if the private
bytes of any MOMHost process exceed 100 MB on the agent. This is caused when running
responses or scripts consume large amounts of memory. You can adjust the maximum private
bytes limit by changing the registry settings, shown in Table 6.1.
Table 6.1 Private bytes registry keys
Setting Key
Default host private bytes HKEY_LOCAL_MACHINE\SOFTWAR
E\Mission Critical
Software\OnePoint\MaxDefaultH
ostPrivateBytes
Script host private bytes HKEY_LOCAL_MACHINE\SOFTWAR
E\Mission Critical
Software\OnePoint\MaxScriptHo
stPrivateBytes

The agent queue is filling up


If there are events saying that the agent’s queue is getting full, refer to “Resource usage” and
“Agent queue issues”.

Resource usage
Check to see if high resource usage (CPU, memory, disk) is causing a bottleneck.
• Verify that there are no other applications with heavy resource usage, which might be
depriving the MOM service of the resources that it requires.
• If the MOMHost process is consuming too many resources, check to see what responses or
scripts are running at the time of heavy resource utilization.

Agent queue issues


If resource utilization is not high, but the agent queue is filling up periodically:

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community
Introduction 17

• If the agent queue is at 100%, check to see if the agent queue simple count is constant. If this
is the case, verify that MOM 2005 RTM is installed. There was a known issue with queue
deadlocking with the MOM 2005 Release Candidate (RC).
• If MOM RTM is installed, turn on tracing, and examine the agent’s mc8 logs to see if there
are errors that correspond to the times that the queue filled up.

Assess Console Activity


If the consoles are excessively slow to respond to specific views, verify that:
• The number of consoles and agents for the management group is within supported limits.
• There are no database server or Management Server performance problems.
After verifying that none of the preceding conditions exist, focus on the view that is performing
slowing and conduct the following assessment.
See how many rows are displayed in the view
It is recommended that a view contain one day of data, which is approximately 1 million rows. If
your row count is higher, you can decrease the count by editing the time view filter.
Determine how many other consoles are auto-refreshing the same view
It is recommended that you do not have more than 5 consoles open and auto-refreshing the same
view, particularly when working with the Events view. You can close the additional open
consoles or increase the auto-refresh interval to more than the default setting (one minute).
Check resource utilization on the console computer
Verify that other applications are not depriving the consoles of the resources that they require.

Did you find this information useful? Please send your suggestions and comments about
the documentation to momdocs@microsoft.com.

Looking for more MOM information? Experience the power of customer communities!

MOM Community

S-ar putea să vă placă și