Sunteți pe pagina 1din 14

6

Troubleshooting

The world of software has its own set of problems. A few years
ago, using Windows was a very frustrating experience with lots of
crashes. Today, Windows is very stable and deals with anomalies
much more respectably. However, making a specific combination
of software, networking, and security to work can still be prob-
lematic at times. Modern automation systems have large software
content and are, therefore, subject to some of these IT problems.
Fortunately, some tools are available for diagnostics.

DCOM Troubleshooting
IP (Internet Protocol), combined with DCOM, is the platform soft-
ware communication within which automation system is built
upon. Software tools exist that check computers on the network to
see if DCOM is configured to enable remote OPC to work prop-
erly (Figure 6-1). The same tool contains basic network utilities
such as PING to check that basic communication is established.

In a network environment, it is always a good idea to first use the PING


command to ascertain that computers have been networked correctly.
Next, it is a good idea to check the “Network Neighborhood” to assure
the client computer can locate the server computer, and vice-versa.

DCOM troubleshooting tools connect to the OPCEnum compo-


nent delivered with all OPC version 2 servers. It is also possible to
install an agent component on all computers to which a DCOM
troubleshooting tool can connect to get information about OPC
version 1 servers, as well. The tool also can help you make
changes to DCOM security settings to enable remote access.
183
184 Software for Automation

Figure 6-1. DCOM Troubleshooting Tool (Screenshot: SMAR SYSTEM302)

A DCOM troubleshooting tool locates OPC servers from any


vendor. If the computers and OPC servers are detected, and the
OPC servers pass the DCOM test, then any remaining problem is
not likely to be caused by DCOM or basic networking. The test
results can be exported as a text file to be printed or emailed to
technical support. This makes it easier for a technical support
group to assist in troubleshooting. A non-expert can perform the
testing while leaving the diagnosis to the experts.

DCOM Security
DCOM provides the ability for an application on one computer to
start and stop applications running on another machine. This is
clearly a critical function and, therefore, surrounded by lots of
security. Using DCOM, such as when an OPC server or OLE_DB
database exist on a computer different from the display console,
requires proper configuration of the Windows DCOM security
settings. If the settings are done incorrectly, the connection will
not work. In order not to reveal progress to a potentially mali-
cious intruder, there are no error messages that reveal what the
fault is when a connection cannot be established. This is a security
feature, but it makes troubleshooting for friendly purposes diffi-
cult. It is therefore important to follow the DCOM setup systemat-
ically. The setup procedure is explained in Chapter 3 and usually
in manuals for OPC servers and clients.

The proper way to configure DCOM is to restrict access and


launch permissions based on select user groups and users, and to
deny access to everyone else. However, when starting integration
Chapter 6 – Troubleshooting 185

and test, it may be a good idea to set permission to “Everyone.”


This effectively disables the security, eliminating this as a potential
problem. This can be used effectively during troubleshooting to
determine if problems are caused by DCOM security or else-
where. Once you have proven connectivity, you can configure the
actual security settings.

DCOM troubleshooting tools contain wizards to help you set up


DCOM security step-by-step, such as access and launch permis-
sions you would normally set up with the DCOMCNFG utility.
Verify that you have not forgotten to configure the client security
to permit servers to “callback.”

When connecting to remote servers, it may be a good idea to install the


server application on the clients as well. Although the server will never
be used locally, installing the server application will usually configure
the client machines’ DCOM settings appropriately. Some OPC servers
actually gives you the option to install on the client side, then installs
required files and makes appropriate registry settings.

No Data Updates
An easy three-step test of DCOM settings is to browse the OPC
server from a remote OPC client. If you can see the list of remote
OPC servers, this means you have access to read the registry.
Further, if you can see the tags, this means you have access to the
application. If you see the server and the tags but get no updates,
this means the OPC server does not have sufficient rights to call
back to the client with the subscribed values.

Multiple Copies of the OPC Server


If multiple copies of the remote OPC server start, it is most likely
a DCOM security problem. For example, if the Identity tab is set
to “launching user,” every new client may start a new instance of
the OPC server. It is better to run the OPC server as a service
under the system account. Alternatively, if running as a service is
not supported, use “This user” with a named user account.

Remote OPC Server Doesn’t Start


If the remote OPC server does not start, it also is most likely a
DCOM security problem. For example, if the Identity tab is set to
“interactive user,” and nobody is logged in on the machine, it
cannot start. It is better to run the OPC server as a service under
the system account. Alternatively, if running as a service is not
supported, use “This user” with a named user account.
186 Software for Automation

DCOM Intra-domain
Try as far as possible to have OPC servers and OPC clients in the
same Windows domain. If this is not possible, it will be necessary
to employ some additional tricks: the user account name under
which the OPC server runs should also be created on the client
machine, and the user account under which the client runs should
also be created on the server machine. It is important that identical
user account names and passwords be used on both client and
server machines. The client and server accounts can, and preferably
should, use different passwords. Verify that the accounts have been
set up properly by trying to connect to the server machine from the
client machine’s “Network Neighborhood” and vice-versa. Addi-
tionally, DCOM access, launch permissions, etc., must also be set.
Talk to your network administrator or read further about Domain
Trust Relationships in just about any book on Windows
NT/2000/XP. Another alternative to deal with intra-domain
DCOM is to use Web tunnelling, as explained in Chapter 3.

DCOM Time-out
If the network is slow and unreliable, such as across the public
Internet, you may experience time-out problems. In this case, you
may need to use Web tunnelling techniques explained in Chapter 3.

The Windows DCOM protocol has a built-in six minute time-out,


which is fixed. The purpose of this DCOM time-out is simply to
close unused connections to remote servers. Note that the purpose
of this time-out is not to detect communication errors. It is far too
slow for that. Most OPC clients, but not all, probe the servers
every few seconds to detect the health of the servers, and the
communication, to detect failure in communication or the OPC
server itself. OPC-DA version 3 implements an even more sophis-
ticated mechanism where the server continuously sends a “heart-
beat” to the client to inform it is OK.

Be sure to use OPC clients that probe servers for status and report problems.

OPC Troubleshooting
OPC reports several error codes for problems with the OPC server
as a whole, as well as different status for each item (tag). It is also
possible to check the OPC server state. When an OPC client is
unable to display values from an OPC server, a simple test is to
see if the OPC client is able to get data from another OPC server,
and if another OPC client is able to get data from the OPC server.
Chapter 6 – Troubleshooting 187

OPC Server Error Messages


There are more than twenty different error codes defined for an
OPC server to indicate problems with the server as a whole to an
OPC client, plus more than half a dozen DCOM error codes. You
are unlikely to see most of the possible OPC server error messages.
A good OPC client will show the error messages to the user, so the
user knows something is wrong with the OPC server, rather than
leaving the user in the dark. Similarly the OPC client must show
*** or another invalid character instead of the value so the user
does not mistake the value as valid (see Figure 6-2).

Figure 6-2. The OPC Client Shows Server Error Messages (Screenshot:
SMAR SYSTEM302)

It is a good idea to use an OPC client that automatically detects and


displays invalid characters in place of invalid values.

By using the error messages, troubleshooting is simplified. Most


error messages are self-explanatory. Tables 1 and 2 list the errors
with an additional description.

Tag Does Not Exist


The most common error message is “Unknown Item ID.” It
usually means the tag has not been configured in the OPC server.
In a manually configured server, check for omissions or typo
errors. In an automatically configured server, make sure the server
has been updated since the latest changes were made in the
system engineering tool.
188 Software for Automation

Table 6-1. OPC Error Codes of Type Designated “Failure”


Error Description
Invalid Handle Bug in the client or possibly in the server.
Bad Type Conversion to the requested data type is not
supported.
Public Public groups do not support the requested
operation.
Bad Rights Insufficient rights to perform this operation.
Unknown Item ID The tag does not exist in the server.
Invalid Item ID Wrong syntax for the tag.
Invalid Filter The filter string is not valid.
Unknown Path The access path for the tag is unknown.
Range The value exceeds valid range.
Duplicate Name Duplicate name not permitted.
Invalid Configuration File The configuration file for the server is invalid.
Not Found The requested object cannot be found.
Applied, for example, to a public group.
Invalid Property ID The property ID for the tag is invalid.
Deadband Not Set The deadband has not been set for the tag.
Deadband Not Supported Deadband is not supported for the tag.
Rate Not Set Sampling rate set for the tag.
Update rate of the group is used.

Table 6-2. OPC Error Codes of Type Designated “Partial Success”


Error Description
Unsupported Rate Requested not supported by server.
Closes possible rate used.
Clamp Value was clamped before writing.
In Use The operation cannot be performed because the
object is being referenced.
Quality Value written but not the quality.
Time Value written but not the time-stamp.
Data Queue Overflow Server buffer overflowing and oldest data purged.
Therefore, not every detected change has been
returned for this tag.

OPC Item Quality


The OPC status indicates a problem with an individual item (tag)
to the OPC client. The status consists of quality, sub-status, and
limit condition. The quality gives an overall idea of the validity of
the value—for example, if the server is unable to communicate
with the underlying network. A good OPC client will show the
Chapter 6 – Troubleshooting 189

tag quality to the user, so the user knows something is wrong


with the tag. Similarly, the OPC client must show *** or another
invalid character instead of the value when the quality is bad, so
the user does not mistake the value as valid (see Figure 6-3).

Figure 6-3. OPC Client Indicates the Quality (Screenshot: SMAR SYSTEM302)

The concept of status used in OPC is a subset of the status in the


FOUNDATION™ Fieldbus programming language. It may be a good
idea to refer to Fieldbuses for Process Control: Engineering, Operation
and Maintenance1.

The OPC specification does not make clear if the OPC status for
an OPC item (tag) should reflect the status of the value as it exists
in the underlying hardware, or if it should reflect the status of the
OPC communication itself. In other words, should the status indi-
cate the health of the underlying device and sensor, or the
network communication and the OPC server? Both implementa-
tions may exist in different OPC servers used in the same system.
It is a good idea to find out exactly what the status in the OPC
server indicates. Since it is necessary from a troubleshooting point
of view to know if a fault is due to the communication and OPC
server, or due to the underlying device and sensors, most OPC
servers have been implemented in such a way that the OPC item
status indicates the health of the networking and OPC server. For
those parameters that have an associated status in the device, an
additional OPC item is created for this status.

For example, all function block inputs and outputs in the


FOUNDATION™ Fieldbus programming language are data structures
consisting of two simple parameters: value and its status. An OPC
server can thus provide these as two separate items each having
their own OPC status (see Figure 6-4). This scheme makes it
possible to see, for example, that the OPC communication is
working, but the device sensor has failed. In other words, the OPC
communication that tells the sensor has failed, is working OK. This
is better than having a single status that could mean either the
sensor has failed, or the communication for this value has failed.
190 Software for Automation

Figure 6-4. Advanced OPC Clients Distinguish between OPC Status and
Parameter Status (Screenshot: SMAR SYSTEM302)

The most useful piece of information in the OPC status is the


quality. The limit condition is of limited use in an OPC server,
although some generic servers that do scaling may use high and
low limits to indicate the value is clamped at the end of the set
scale.

Bad Quality
The “Bad” quality generally indicates the OPC server is not able
to communicate with underlying hardware. It may be a sign of a
network problem or complete device failure.

Table 6-3. Sub-Status for “Bad” Quality


Sub-status Description
Non-specific No specific reason is given.
Configuration Error Server configuration error. For example the tag
may have been deleted.
Not Connected The tag is not connected. The value is not being
provided by the data source.
Device Failure The underlying device has failed, most likely the
output, but is still able to communicate. This is not
really related to the health of the communication or
OPC server and thus this sub-status is rarely used.
Sensor Failure The underlying sensor has failed. This is not really
related to the health of the communication or OPC
server and thus this sub-status is rarely used.
Last Known Value Communications failure. The last polled value is used.
Communications Failure Communications failure. There is no last polled
value available.
Out of Service The OPC item (tag) or OPC group is inactive.
Waiting for Initial Data Seen only on newly added tags before the first
value has be read from the underlying network.
Chapter 6 – Troubleshooting 191

Uncertain Quality
The “Uncertain” quality, in most implementations, only indicates
a communications failure.

Table 6-4. Sub-Status for “Uncertain” Quality


Sub-status Description
Non-specific No specific reason is given.
Last Usable Value Communications failure. Publications are not
being received and the value is therefore
“stale.” The last published value received by
the subscriber is used.
Sensor Not Accurate The underlying sensor is inaccurate. This is
not really related to the health of the
communication or OPC server and thus this
sub-status is rarely used.
Engineering Units Exceeded The measurement is outside the range of
the underlying sensor. This is not really
related to the health of the communication
or OPC server and thus this sub-status is
rarely used.
Sub-Normal The value is derived from multiple
underlying source of which not all are
“Good.” This is not really related to the
health of the communication or OPC server
and thus this sub-status is rarely used.

Good Quality
The “Good” quality indicates that OPC communication is fine.

Table 6-5. Sub-Status for “Good” Quality


Sub-status Description
Non-specific No specific reason is given.
Local Override The value has been Overridden. Typically this means
the OPC server is in a simulation mode and the value is
being “forced.”

OPC Compatibility
All OPC specifications define some mandatory and some optional
features, in fact they are individual interfaces. Therefore, not all
OPC servers and clients are the same. From time to time you will
find that an OPC client that could perform a particular function
with one OPC server is unable to do it with another, or that one
client does not have a particular OPC feature supported in
another. Note that an OPC client should never require an optional
192 Software for Automation

OPC feature in order to work. Any OPC client should be able to


work simply using only the mandatory OPC features.

The OPC Foundation provides a test kit that OPC server vendors
use to test and certify their OPC products. This tester ensures all
mandatory features are supported and the specification has been
implemented correctly, thus ensuring compatibility between
different OPC products.

It may be a good idea to only use certified OPC products to ensure they
support and work on required features (interfaces).

An OPC troubleshooting tool locates OPC servers from any


vendor. OPC troubleshooting tools can be used to check which
features (interfaces) an OPC server supports (see Figure 6-5).

Figure 6-5. OPC Troubleshooting Tool Reveals Supported Features in OPC


Servers (Screenshot: SMAR SYSTEM302)

OPC troubleshooting tools include simple OPC clients that permit


the user to explore servers, create groups and add items (tags) just
like a “real” OPC client does. It is possible to adjust settings like
sampling interval and deadband, then monitor and write the
values. Thus it is possible to test the functionality of a server step-
by-step in a controlled manner with full visibility. The log records
all call activities with the server as success or failure. This makes it
easier to pinpoint compatibility problems. The log can be
exported as a text file to be sent to the technical support team for
a diagnosis. Thus, a non-expert can perform the testing while
leaving the diagnosis to the experts. If an OPC troubleshooting
tool is not available, some simple server testing can be done using
a simple OPC client. Some OPC troubleshooting tools are able to
“eavesdrop” and log call activities between any client and server.
Chapter 6 – Troubleshooting 193

OPC Server State


The troubleshooting tool reveals complete OPC server state: start
time, current time, last update time, number of groups, deadband,
update rate, and current state of the server (see Table 6-6). The start
time information is ideal to reveal if the OPC server has failed and
then been automatically restarted by the clients (see Figure 6-6).

Figure 6-6. OPC Server State and Other Diagnostics (Screenshot: SMAR
OPC DataSpy)

The server state can reveal what is wrong with the OPC server.

Table 6-6. OPC Server State


State Description
Running Running normally. This is the usual state for a server.
Failed The server had a fatal error and is no longer functioning.
No Configuration Configuration information has not been loaded and
thus the server cannot function. Make sure to load
the configuration from the generic server
configuration tool, or export configuration from the
associated system configuration tool.
Suspended The server has been stopped and is not providing
data. Start the server from the associated server
management tool.
Test The server is in Test Mode. The values do not come
from the underlying hardware, but are instead
simulated. Turn off the simulation mode from the
server manager.
Communication Fault The server is unable to communicate with the
underlying hardware. The extent of the problem is
not indicated. It may be a single parameter, a device,
or the entire network depending on the implementation.
Since parameter and device communication errors
are best indicated by individual parameter status, it is
most likely that this status indicates a complete
194 Software for Automation

It is also possible to drill down into individual groups to see the


group name, scan rate, and deadband.

Internet Explorer 6.0


Installing Internet Explorer version 6.0 can cause a problem with
the OPCEnum component that will result in different errors and
associated messages. Many OPC servers install fixed files to
correct this problem. The files are also available for download
from the OPC Foundation site.

OPC Server Shuts Down


The standard behavior is that the OPC server shuts down when
no clients are connected. This behavior may be undesirable in
some applications since restarting the server and re-establishing
the underlying communication can take a very long time. Many
servers come with management applications through which you
can ensure that the server never shuts down. Another alternative
is to run a simple OPC client permanently on, continuously
polling a value from the server. This way the server will always
have to remain on.

NetDDE
The most common problem when doing remote DDE using the
NetDDE functionality is the required network services have not
been started. Make sure the Windows Network DDE and
Network DDE DSDM services are running. Configure DSDM to
start manually and Net DDE to start automatically. This is done
from the Windows Control Panel/Administrative Tools/Services.

Net DDE shares must be “trusted” for a remote client to connect


set using DDESHARE. You need administrator rights. Also, on the
server enable network access in the local security policy.

DLL Hell
These days we have learned to live with software as a “living
organism.” Applications require what all too often feels like
endless patches, service packs, and upgrades. When applications
share DLLs a phenomenon known as “DLL Hell” may occur.
When a new application is installed or an existing application or
the operating system is upgraded, one or more shared DLL files
are upgraded with a new version. Because an existing DLL is
replaced, some of the existing applications stop functioning or
Chapter 6 – Troubleshooting 195

cause a computer crash. Reinstalling the broken application again


can many times solve the problem. In the early days of OPC this
was a problem with the proxy/stub DLL. While newer OPC
applications don’t cause this problem because all modern applica-
tions now install a file from the OPC Foundation, some older
applications may fail or cause a computer crash. Consider
running special applications on dedicated machines to make sure
the upgrade of one application does not affect the others.

Memory Leak
A memory leak is a bug characterized by a continuing loss of
available memory caused when an application does not free up
unused memory. Eventually, all memory is used up, and the
application fails. The memory leak problem is especially severe
for automation applications because the programs often run for
years, and components can be started and stopped thousands of
times without restarting the operating system. This is a hard-to-
detect bug, even with special software tools. Users generally
cannot do anything other than report it to the supplier.

License Restrictions
License restrictions can cause a number of different problems.
License issues include such concerns as number of tags and
number of users. OPC servers may, for example, stop updating
after an evaluation period, such as a few hours to a month, has
expired. Similarly, additional tags or clients may not be updated
when the licensed number has been reached.

ActiveX Registry Issues


In order for components such as ActiveX to function properly, they
must first be registered in Windows. Typically the installation
wizard handles this. If not, refer to product information on how to
do it.

Exercises
1. How long does it take for an OPC client to detect that an
OPC server has failed?

2. While browsing a remote server, if you can see the list of


OPC servers installed on the server node but can’t browse
the tags in the server, what is the possible cause?

3. What does the error OPC_E_UNKNOWNITEMID likely


mean?
196 Software for Automation

References and Bibliography


1. Berge, Jonas. Fieldbuses for Process Control: Engineering, Oper-
ation and Maintenance. ISA – The Instrumentation, Systems,
and Automation Society, 2002.

S-ar putea să vă placă și