Documente Academic
Documente Profesional
Documente Cultură
Solving Network
Problems Before
They Occur
sponsored by
Article 1: How to Use SNMP in Network Problem Resolution .............................................................. 1
SNMP, the Solution .............................................................................................................................................. 1
SNMP, Total Network Awareness ................................................................................................................. 3
SNMP, Disaster Protection ............................................................................................................................... 4
SNMP, Easy Implementation .......................................................................................................................... 5
Article 2: How to Use WMI in Network Problem Resolution ................................................................ 6
The Network Rosetta Stone ............................................................................................................................ 6
WMI, Finger‐Pointer Preventer ..................................................................................................................... 8
WMI, Keeping Email Operational .............................................................................................................. 10
WMI, Network Monitoring for Servers and Applications ............................................................... 10
Article 3: How Effective Configuration Management Aids in Network Problem Resolution 11
Config Management, Little Problems with Big Impact ..................................................................... 12
Config Management, When the Fix Is Harder than the Problem ................................................. 13
Solving Network Problems Requires the Right Vision ..................................................................... 14
i
Copyright Statement
© 2009 Realtime Publishers. All rights reserved. This site contains materials that have
been created, developed, or commissioned by, and published with the permission of,
Realtime Publishers (the “Materials”) and this site and any such Materials are protected
by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtime Publishers or its web site
sponsors. In no event shall Realtime Publishers or its web site sponsors be held liable for
technical or editorial errors or omissions contained in the Materials, including without
limitation, for any direct, indirect, incidental, special, exemplary or consequential
damages whatsoever resulting from the use of any information contained in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtime Publishers and the Realtime Publishers logo are registered in the US Patent &
Trademark Office. All other product or service names are the property of their respective
owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtime Publishers, please contact us via e-mail at
info@realtimepublishers.com.
ii
Article 1: How to Use SNMP in Network
Problem Resolution
I’ve spent almost 15 years of my life as an IT professional. In that time I’ve been a phone
support operator, field technician, systems administrator, consultant, and now an
independent technology author and presenter. Through those experiences, I’ve seen a wide
range of very different environments in very different businesses. Those IT environments
range from the exceptionally simple, installed into actual closets within small business
offices, all the way to multi‐enterprise, multi‐national collaborative networks.
What’s interesting about all of them is their similarity. Some networks have more
applications than others. Some have faster connections between sites. Some use more
remote applications. Yet there’s a common thread in all of them: from time to time, they all
have problems.
There’s also something remarkably strange about those networks I’ve seen. Even though
we can all agree that every network occasionally has its problems, relatively few have the
tools in place to find and fix them. For reasons of cost, or time, or lack of subject knowledge,
many IT organizations haven’t implemented unified and comprehensive network
monitoring solutions.
It is my goal in this Essentials Series to explain why you should. With the right platform in
place, you’ll experience less downtime, more customer satisfaction, and fewer late nights
tracking down the network problems of the day. Using a series of examples from my own
experience, I want to show you how effective network monitoring can help to solve
network problems before they occur.
SNMP, the Solution
Let’s start by looking at actual solutions to your network’s visibility problem. Networks are
by nature very opaque. You can’t simply peer through cables or into routers to see the
behaviors going on during their operation. To see what’s going on in your network, you
need tools that do the peering for you.
1
Those tools start with the individual devices themselves. For example, if you queried the
interface statistics on a Cisco router, you would be greeted with information about that
interface’s traffic:
router1#show int
Ethernet0 is up, line protocol is up
[…snip…]
37592 packets input, 2859273 bytes, 0 no buffer
Received 15938 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 input packets with dribble condition detected
15288 packets output, 1395393 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets, 0 restarts
0 output buffer failures, 0 output buffers swapped out
That information is descriptive of the individual device you’ve logged into, but stops there.
Today’s network devices natively include all the necessary capabilities to gather and report
on their network traffic statistics. You can today request this information from each device
and manually build a picture of how your network is operating. However, the complexity of
doing so rises dramatically as your network’s count of interconnected devices goes much
past one.
To combat these complexities, the Simple Network Management Protocol (SNMP) was
ratified in the early 1990s. This protocol enables a request‐response framework between
individual devices and a central Network Management Solution (NMS). Individual devices
can be polled for their information through a GET request by the NMS. Device information
is stored and can be addressed via its globally‐unique Management Information Base (MIB)
Object Identifier (OID). An OID’s long string of digits represents the “address” for the unit of
information being stored on that device. Information being stored can relate to network
statistics, details about that device’s configuration, performance and throughput metrics, or
really any information that the device’s manufacturer has enabled.
This part of SNMP’s poll‐based nature means that information must be requested if it is to
be sent back to the NMS. For this reason, SNMP also has a unidirectional alert component.
An SNMP “trap” represents a preconfigured alert from a device back to its NMS, reporting
on conditions that the NMS should know about. This setup enables SNMP clients to rapidly
notify the NMS when problems exist.
SNMP also comes in many versions, with later versions including additional and desired
features over those in the previous. SNMP v3 is today’s version commonly used by most
environments because it adds a suite of critical security features that protect its data in
transit and authenticates servers prior to communication. This encryption ensures that the
clear text data transfers of earlier versions are protected from prying eyes, while servers
must prove their identity before they’re communicated with.
2
You’ll probably recognize that this information on SNMP is neither new nor revolutionary
in the way it works. With SNMP rapidly approaching its 20th birthday, its protocol is mature
and its capabilities are well known. Yet in making this statement, why are so many IT
organizations still not using it? Perhaps they don’t understand its true power in solving
network problems before they occur. Consider a few examples…
SNMP, Total Network Awareness
Recognizing how SNMP does its job is far less exciting than realizing how it can spot and
solve network problems. The information gained through SNMP connections and stored in
a central NMS enables a situational awareness of your network. This awareness illuminates
the behaviors on all devices through a single console, providing you a single heads‐up
display of your network’s health.
As an example of this, I used to work for a company that built satellite ground stations. This
company’s complex development activity required the cooperation of multiple business
units and even multiple companies, all in different locations. To ensure that everyone was
working on the same page, we architected a centralized collaboration environment that
brought all parties together to the same set of applications. This remote application
infrastructure was a perfect solution for its users, enabling them to share documents and
work together whether they were in Colorado, California, Massachusetts, or anywhere.
Perfect, that is, until the network began experiencing problems. Remote application
infrastructures, such as Microsoft Terminal Services or Citrix XenApp, by nature perform
well over low‐bandwidth connections. They enable users to work on remote applications
as if they were installed locally, even over the slowest of network lines. Yet although they
do well in low‐bandwidth situations, the streaming nature of their protocols means they do
not do well across those that are highly latent.
In this environment, it was well known that certain WAN connections to certain sites would
experience latency from time to time. This project’s network traffic was only a portion of
the traffic sourcing from each site. Rather than waiting for administrators to get phone calls
when users’ experience degraded, this environment elected instead to configure SNMP
across each remote device. Each device was configured to report to a central NMS. That
NMS queried each device for its interface utilization and ping latency statistics on a regular
basis. Traps and subsequent administrator alerts were additionally set up to alert the
central NMS when metrics went below acceptable thresholds.
3
Figure 1: SNMP enables the creation of ping latency graphs across multiple devices.
The result was the creation of a real‐time graph similar to that shown in Figure 1. There,
you can see where ping latency information across devices was graphed, giving
administrators information about the health of each connection. Because the right people
were also alerted as conditions went below thresholds, they were able to compensate as
necessary to maintain their users’ experience.
SNMP, Disaster Protection
Although SNMP is most commonly associated with gathering network statistics and
configurations, it is extensible to even non‐network devices as well. SNMP was originally
developed as a communications framework between all kinds of networked devices. Thus,
any device with a network connection can potentially receive and respond to SNMP
requests or send its own traps.
Nowhere is this more valuable than with the environmental sensors used in many data
centers today. These environmental sensors regularly check the temperature, humidity,
and (in the case of accidental flooding) water level present in the data center room. The
installation and use of these sensors is critical to ensuring that your expensive IT
investment doesn’t melt down if your data center air conditioning stops functioning.
4
That exact situation happened to me at another former client. That day, I had the lucky
privilege of stepping into their data center on the very day their air conditioning unit
experienced a massive, yet unnoticed, failure. Walking into that data center, the massive
outpouring of heat made me immediately recognize that something was terribly wrong. I
looked over to the room’s temperature sensor—a cheap model more often found attached
to the outside of your bedroom window—to discover that the temperature had crossed the
80° threshold and was increasing at a rate of 1° every 10 minutes. Humidity was similarly
affected.
Although the problem was quickly resolved through the forced shutdown of non‐essential
equipment and the introduction of backup air conditioning, the problem could have been
dramatically worse had my timing been different. The network‐enabling of data center
sensors using protocols such as SNMP illuminates another of this protocol’s key value
propositions.
With the right tools in place, an alert could have notified administrators immediately when
temperature conditions in the data center started their deviation. Consolidating SNMP’s
data into a unified network management solution enables the real‐time alerting of
problems directly to network administrators.
SNMP, Easy Implementation
As I travel across IT environments, I find that a common hurdle in implementing
comprehensive monitoring relates to its perceived difficulty in implementation. Although
numerous enterprise‐scale monitoring solutions are available today, their implementation
often installs little more than an empty shell to be later populated by dedicated monitoring
administrators.
Needed for environments that aren’t necessarily “enterprise” are cost‐effective solutions
that implement quickly and without the need for specialized knowledge. The right
solutions for your environment will immediately begin gathering useful data with a
minimum of daily maintenance. As you’ll discover in the next article of this series, such
solutions integrate with servers and applications as well as networking devices to provide
a complete view into your network.
5
Article 2: How to Use WMI in Network
Problem Resolution
I’ve found myself constantly amazed at the language barrier we experience in the world of
IT. I’m not talking here about the barrier between the technologists and the non‐
technologists, the geek and non‐geek. I’m speaking about the language barrier we’ve all
experienced between an organization’s “server” administrators and their “network”
administrators.
You’ve probably been in the same situation as you’ve been called in to work together on a
big problem. Your network team sits at one side of the conference table, while the server
admins take over the other. Although some problem is preventing your users from getting
their job done, the two opposing teams pull out domain‐specific vocabulary the other
doesn’t understand in an effort to prove that the problem isn’t their fault.
This circle‐the‐wagons approach to solving IT problems has been around as long as the
problems themselves. When a major problem occurs in the environment, a common
approach is to gather everyone that is potentially involved and focus them on today’s
firefight. Yet solving problems in this way is expensive in terms of people’s time and in cost
to your business. There’s got to be a better way.
The Network Rosetta Stone
With the right Network Management Solution (NMS) solution in place, it is possible to
improve your resolution of large‐scale problems without the finger pointing. The right
solutions leverage integration to servers and applications as well as network components
to provide complete visibility into your operating environment. The result is that your NMS
becomes a kind of Rosetta Stone or universal translation device between IT teams; the NMS
helps the network team understand the impact of servers and applications, while giving
systems administrators a perspective on the network infrastructure.
One way in which an NMS, acting as a Rosetta Stone, translates your Microsoft Windows
computers is through Windows Management Instrumentation (WMI) integration.
Microsoft’s WMI is a platform‐specific service that enables third‐party devices to query the
Microsoft OS for details about its behaviors. As a rough analogy, if you consider the Simple
Network Management Protocol (SNMP) the request/response tool for network device
monitoring, WMI performs the same actions within the Windows OS. A typical WMI query
might look like this:
Select FreeSpace from Win32_LogicalDisk
6
In this query, the targeted machine is asked to provide the amount of free space on its
installed volumes. This process looks different than SNMP’s numerical Object Identifier
(OID) approach, but the result is the same. An NMS queries for information across multiple
Windows machines, storing the results into its local database and reporting them along its
management consoles. Because multiple Windows machines can be queried by a central
monitoring server, that server becomes the locus of analysis for behaviors across servers
and network devices (see Figure 1). As a result, it becomes dramatically easier to locate or
prevent network problems because their root cause can be tracked to very specific
endpoints and behaviors.
Figure 1: A unified dashboard that displays information about servers, applications,
and network devices in one place.
7
WMI, Finger‐Pointer Preventer
It is the intersection of WMI and SNMP monitoring where an NMS provides great value. It
also helps out with the historical problem of teams pointing fingers at each other. Consider
another situation I experienced not long ago with one of my consulting clients. At this
client, a particular Windows virtual machine was experiencing an intermittent problem
with its network connection. That network problem would occur only irregularly; however,
when it did occur, it impacted a large number of users. Thus, resolving this problem was
extremely important for this client.
The client was very focused on the perceived network source for this problem, pointing
their attention and resources to the network and its behaviors. “There must be something
wrong with the network cards, their drivers, or their firmware,” they would tell me.
Yet what they did not recognize was how virtualization tends to significantly increase the
complexity of troubleshooting these types of problems. With multiple virtual machines co‐
located atop a single virtual host, a simple network problem’s root cause can be something
as seemingly‐unrelated as a shortage of system memory or too much consumption of
processor cycles.
To resolve the situation, a unified NMS was implemented that enabled the collection and
reporting on metrics through SNMP and WMI statistics. This same solution integrated with
the virtualization platform to provide additional data about its processing as well (see
Figure 2). The solution to the problem was immediately discovered the very next time it
occurred. WMI queries to the virtual host discovered that the virtual host’s processor
utilization experienced a dramatic spike in use at the very moment the networking
problem occurred. The solution was to offload some of that virtual host’s workload to other
servers to prevent the resource‐overuse situation.
8
Figure 2: A single view with SNMP, WMI, and even virtualization counters provides a
holistic view of the entire environment.
9
WMI, Keeping Email Operational
Another averted crisis that may strike home in your own network environment has to do
with keeping email servers up and running. Although most businesses can endure the loss
of file servers for a day, or even a few databases for a few hours, the loss of the email
system usually sends a business’ executives into orbit.
That’s why in organizations both large and small, the email system is often considered one
of the most important services to remain up and operational. Email at the same time can be
one of the most dynamic data processing systems in your data center. Handling thousands
of messages a day in even the smallest of environments, email systems must effortlessly
deal with large attachments, malware, and addressing failures while preserving the users’
experience within their desktop email clients.
I was once called in to architect a monitoring solution for a company in the financial
services industry. Although this client needed the monitoring solution for their entire
multi‐site infrastructure, the real reason for its implementation was due to regular and
painful problems with the email server.
Implementing the right kind of tools for this small business of less than 100 employees was
a trivial installation. Connecting it to network devices, identified servers, and even a few
clients was not difficult because the system included preconfigured templates for each type
of device. We completed the installation and initial configuration in less than a day.
The next morning, I returned to the client to find an extremely tired but extremely happy
systems administrator sitting at his desk. It turns out that the majority of the problems
with the email system were related to users overfilling it with data to the point where it
would consume all its available disk space. That very night after the installation of this
monitoring system, the administrator received an alert notifying him that the email
server’s disk drive was within a few percentage points of full consumption. Unlike in each
of the previous incidents, this administrator was able to add the necessary disk space prior
to the email server’s database shutting down. The right level of monitoring across network,
server, and even application facets of the IT environment prevented the problem from ever
occurring.
WMI, Network Monitoring for Servers and Applications
As with the previous article in this series, these stories are told to explain why effective
monitoring goes far in preventing problems. With the right monitoring that spans every
part of an IT environment, you gain much‐needed visibility into areas where you otherwise
would have none. By integrating the network focus traditionally associated with SNMP
with the server and application focus commonly used with WMI, that vision spans the
entire environment. In the end, it may bring your network and server teams closer together
as a cohesive unit for better managing your IT infrastructure.
10
Article 3: How Effective Configuration
Management Aids in Network Problem
Resolution
The third focus of this Essentials Series is on the need for effective configuration
management, a common feature across many Network Management Solutions (NMSs) but
one that sometimes gets missed. In this instance, what do I mean by configuration
management? I mean the unified storage and uniform distribution of configurations to each
of the devices on your network.
There is a certain brilliance in the way that most network devices can and are configured.
Using little more than text files, a smart administrator can set up their interfaces, ACLs, and
essentially every other setting within these devices. Their use of text files means that one
device’s configuration can very easily be replicated on another device through a file copy.
Their editing is also trivial, accomplished with a simple text editor or SSH application. As an
example, the following code snippet shows the simplicity of a Cisco device’s initial
configuration:
no service password‐encryption
!
hostname Router
!
enable secret 5 $2m$FJdHx53V$t7rQJop3jjbXIB7n3
!
interface FastEthernet0/0
ip address 192.168.1.1 255.255.255.0
duplex auto
speed auto
!
interface FastEthernet0/1
no ip address
duplex auto
speed auto
shutdown
!
interface Vlan1
no ip address
shutdown
11
Yet there’s a certain level of pain that comes with this simplicity. That pain grows as the
number of devices and their individual configurations increases in number. Managing the
configuration of just a few devices means that you’re responsible for just a few text files
and their individual settings. But as your network grows in size and complexity, your
number of elements under management grows geometrically. At some point, no one person
can safely handle the sheer volume of text files and their settings that are required by a
production network.
It is in just this situation where the configuration management elements of an effective
NMS grow extremely valuable to the IT organization. An effective NMS will include the
database storage of configurations, versioning and version control of individual config files,
analysis tools for comparing those files, and the ability to rapidly deploy changes to devices
all across the network. In much the same way that most people program their favorite
phone numbers into their cellular phones, managing a network through an NMS ensures
you don’t accidentally call the wrong person, forget a phone number, or misconfigure a
device in such a way that brings down the LAN.
Config Management, Little Problems with Big Impact
This workflow wraps around the traditional actions associated with changing a device
config and adds a lot of value to the process. Consider a situation I experienced a number of
years ago in the network of a major governmental defense contractor. There, a network
condition began occurring where some servers intermittently lost their connection with
the network. When those servers could talk to the network, their connection speeds were
dramatically lower than expected. Network bandwidth rates were so slow that network
applications began to suffer, users began calling into the Help desk, and fellow
administrators started contacting loved ones to report they’d be spending the night.
In this situation, the entire staff of systems administrators was tasked with resolving the
problem. As the problem affected a large percentage of servers on the network, every eye
was needed on the problem.
After a full day of troubleshooting by the entire staff, the problem was eventually tracked to
an incorrect configuration on a particular switch in the data center. That configuration
mismatched the duplex settings between the switch and its connected servers, with one
side inexplicably reset to 100/Half duplex with the other at Auto/Auto. As a result, the two
sides found themselves repeatedly renegotiating their communication channel, with the
resulting loss in service and performance.
In the end, a half‐dozen systems administrators lost a full day of productive work as a
result of a very simple misconfiguration. This misconfiguration was set into place by a well‐
meaning network engineer, who manually made a small change to a config file and
accidentally introduced the error. Because the engineer completed the change using a
traditional SSH connection directly to the device, the change wasn’t logged into any change
management system. No one knew about the change, and so no one was looking in that
location for the problem. Conversely, had the engineer made the change using an NMS’
change control engine, the error would have been found before it was released into
production.
12
Config Management, When the Fix Is Harder than the Problem
Another story that is relatively common with network engineers involves an enterprise
client of mine and their massively distributed network. This client was a single business
unit of a much larger corporate network, responsible for the network traffic for many
thousands of people across dozens of sites. As you can imagine, the level of networking
equipment required to support the infrastructure was large and exceedingly complex.
This client and I were working on a widespread network slowdown situation. This
situation was not necessarily that the network had gotten slow, or for some reason stopped
operating at its expected level. In this environment, the network was slow, had been slow,
and its users had grown to accept its slowness as baseline. The network engineer and I
recognized that its baseline performance simply did not make sense based on the kinds of
equipment in the infrastructure and the bandwidth rates between sites. In this
environment, even the intra‐LAN traffic itself was slow beyond comprehension.
After a substantial amount of time peering through reports and looking through device
statistics, we realized that a small but important misconfiguration had been propagated
into the config files of each and every device on the network and across every site. The
specific misconfiguration is less important than the realization that the scope of the fix was
far greater than our group of individuals could take on. With literally thousands of devices
spanning dozens of sites, the steps needed to locate each device, log in, make and confirm
the change, and move on to the next device was anticipated to take between 5 and 10
minutes per device. Multiplying that number across each device meant that the solution
could take literally months of constant manual effort to resolve.
Adding to the complexity of the resolution was the nature of the fix itself. Due to the
specific change required, a rapid fix was necessary to preserve network connections
between sites. Although the fix was trivial, the network engineers were baffled as to how to
implement it.
The solution arrived with the implementation of an NMS not unlike those discussed in this
Essentials Series. By adding the NMS to the environment and instructing it to automatically
discover and map the network infrastructure (see Figure 1), the organization was able to
very quickly bring each individual device under centralized management. Using the NMS
solution’s bulk change feature enabled the team to quickly implement and distribute the
change across the infrastructure. The result was a massive improvement in performance
across the business unit, and a promotion for the engineer.
13
Figure 1: An NMS’ automated discovery and mapping features can quickly bring a
large network infrastructure under management.
Solving Network Problems Requires the Right Vision
As stated earlier, the goal of this Essentials Series has been to illustrate why effective
monitoring and management is necessary for a healthy network. That need is the case
irrespective of the size of your network. Whether you’re a small business with a few
devices or a large enterprise with many thousands, not having this vision prevents you
from actually understanding what’s going on inside your network.
As this article has shown, not having these tools also inhibits you from cohesively managing
the configuration of your network devices when you need them the most. When looking for
an NMS, look for one that is scoped to the needs of your environment, with the right
features and integrations you require for a complete situational awareness across the IT
landscape.
14