D - ABB-System Health PDF

System 800xA Health Check
Doc. Id. PA-SE-XA-006561 Rev. Gd4 Mar 2019

Copyright 2018 ABB. All rights reserved.
Contents
1 lntroduction .............................................................................................................. 1
1.1 ABB Service Products Data Collector (SPDC).............................................. 1
1.2 Scripted tests (legacy) .................................................................................. 1
1.3 Executing the scripts (legacy) ....................................................................... 2
2 Software version check ........................................................................................... 3
2.1 Microsoft software check ........................................................................... 4
2.2 ABB software check .................................................................................. 5
2.3 System Extensions check............................................................................. 5
2.4 Configure System task check (only applicable in SV 6.0) .......................... 6
Open the System Configuration Console, then open System Setup 
Configure System ......................................................................................... 6
3 Computer hardware check ...................................................................................... 7
4 Network hardware check ......................................................................................... 7
4.1 RNRP check................................................................................................. 8
4.1.1 RNRP Network Status Tool (SV 6.0) ........................................... 8
4.1.2 RNRP Monitor ................................................................................ 9
4.1.3 RNRP Fault Tracer ......................................................................... 9
4.1.4 RNRP Log .................................................................................... 10
4.1.5 The “hosts” file .............................................................................. 11
4.1.6 RNRP Response Time check ....................................................... 12
4.2 System 800xA System Network Settings check ......................................... 12
4.3 Network Adapter Bind Order vs Network Metric ......................................... 13
4.4 Network Management check ...................................................................... 14
4.4.1 Port speed and duplex .................................................................. 15
4.4.2 Port statistics ................................................................................ 16
4.4.3 Uptime .......................................................................................... 17
4.4.4 IGMP ............................................................................................ 17
4.4.5 Spanning Tree (STP/RSTP) ......................................................... 18
4.4.6 Ring redundancy (or Layer 2 redundancy) active by default ......... 18
4.5 Simple network bandwidth and roundtrip time check ............................... 19
5 Domain Controller nodes ...................................................................................... 20
5.1 Active Directory and domain controller redundancy health check ............ 20
5.2 Flexible Single Master Operation (FSMO) roles check ............................... 20
5.3 Global Catalog check ................................................................................. 20
5.4 Active Directory replication test .................................................................. 21
5.5 DNS check ................................................................................................. 21
5.5.1 Windows Event Log  DNS ......................................................... 21
5.5.2 Zone type and deployment ........................................................... 21
5.5.3 Forward zone................................................................................ 22
5.5.4 Reverse zone, primary plant network ............................................ 22
5.5.5 Reverse zone, secondary plant network ....................................... 22
5.5.6 Standalone domain controllers ..................................................... 22
5.5.7 Forwarders ................................................................................... 22
5.5.8 DNS self-test ................................................................................ 23
5.6 Windows Event Log check ...................................................................... 23
5.7 Hard disk check ..................................................................................... 24
5.8 Login environment .................................................................................. 24
5.9 Network Adapter Bind Order, DNS settings and name lookup ................. 24
6 Computers with System 800xA ............................................................................. 25
6.1 Aspect Servers ........................................................................................... 25
6.1.1 Windows Event Log check ........................................................ 25
6.1.2 Hard disk check ......................................................................... 25
6.1.3 User and System locales (Regional Settings) ............................ 26
6.1.4 Network Adapter Bind Order, DNS settings and name lookup
................................................................................................... 26
6.2 Connectivity Servers .................................................................................. 27
6.2.2 Hard disk check ......................................................................... 27
................................................................................................... 28
6.2.5 Connected controllers ................................................................... 28
6.3 Application Servers .................................................................................... 28
6.3.2 Hard disk check ......................................................................... 28
................................................................................................... 29
6.3.5 Application Server specific tasks................................................... 29
6.4 Clients ........................................................................................................ 30
6.4.2 Hard disk check ......................................................................... 30
................................................................................................... 30
7 Application check .................................................................................................. 31
7.1 Process Portal A – System Event List ........................................................ 31
7.2 Process Portal A – Operator Message List ................................................. 31
7.3 Service Structure ........................................................................................ 31
7.4 Data Source Definition aspects .................................................................. 32
7.5 Graphic Performance ................................................................................. 33
7.5.1 Diagnostics Window ..................................................................... 33
7.5.2 New Graphics (PG2) graphics checks .......................................... 34
7.5.3 Visual Basic (VBPG) graphics checks .......................................... 35
7.5.4 Workplace memory usage ............................................................ 37
7.6 User roles and security permissions ........................................................... 38
7.7 Objects in Lost and Found.......................................................................... 39
7.8 Consistency Check..................................................................................... 40
7.8.1 Consistency Check – User defined object type libraries................ 41
7.8.2 Consistency Check – Library Aspect check .................................. 42
7.8.3 Consistency Check – Control Structure ........................................ 42
7.8.4 Consistency Check – other structures........................................... 43
7.8.5 Consistency Check – internal data structures ............................... 43
7.9 System NLS check ..................................................................................... 44
7.9.1 Test of Control Builder Name aspects .......................................... 44
7.9.2 Test of Plant Explorer Name aspects ............................................ 44
8 Other tests .............................................................................................................. 44
8.1 Affinity ........................................................................................................ 44
8.2 Aspect Directory service health .................................................................. 45
8.2.1 Aspect Directory synchronization .............................................. 45
8.2.2 Master vs slave ............................................................................. 45
8.2.3 Frequency of transactions............................................................. 46
8.3 Alarm Manager service health .................................................................... 46
8.4 OPC DA Connector service health ............................................................. 47
8.4.1 OPC DA analysis .......................................................................... 47
8.4.2 Recovery Items............................................................................. 49
8.5 Basic History service health ....................................................................... 50
8.6 Event Collector service health .................................................................... 52
8.6.1 Discarded alarm/events ................................................................ 52
8.6.2 Source Object Handling ................................................................ 52
8.7 General process health .............................................................................. 53
8.8 Windows Firewall ....................................................................................... 53

8.9 File fragmenting ...................................................................................... 54
8.10 Time Synchronization ............................................................................. 55
8.11 PNSM Basic Computer Monitoring ............................................................. 56
8.12 Anti-virus software ...................................................................................... 56
9 Backup strategy ..................................................................................................... 57
9.1 Drive Images .............................................................................................. 57
9.2 Microsoft Windows Domain ........................................................................ 57
9.3 System 800xA ............................................................................................ 58
9.3.1 Aspect Directory ........................................................................... 58
9.3.2 External Services .......................................................................... 58
9.3.3 Manual Exports............................................................................. 58
9.4 Application Servers .................................................................................... 58
10 Installation, environment, etc................................................................................ 59
11 AC 800 Connect ..................................................................................................... 60
11.1 AC800 OPC Server .................................................................................... 60
11.1.1 Setup Wizard ................................................................................ 60
11.1.2 Settings in OPC Server panel ....................................................... 60
11.1.3 Tools in OPC panel....................................................................... 60
11.1.4 Log files ........................................................................................ 60
11.2 Control Builder M........................................................................................ 61
11.2.1 Project settings (Right click on Project icon  Settings) ............... 61
11.2.2 Tools ............................................................................................ 61
11.2.3 Controller hardware object (Hardware AC 800M) ......................... 62
11.2.4 Hardware configuration editor (PM Type) ..................................... 62
11.2.5 Setup Wizard - Heap setting (In systems prior to version 5.1)....... 62
11.2.6 Status for the controller, CEX-modules and IO-modules. .............. 62
11.2.7 Log files ..................................................................................... 62
11.3 AC 800M Controller .................................................................................... 63
11.3.1 Remote System dialog .................................................................. 63
11.3.2 Firmware Information ................................................................ 63
11.3.3 Controller log files ..................................................................... 64
11.3.4 MMS Connections ..................................................................... 66
11.3.5 Controller Analysis ........................................................................ 66
11.3.6 Diagnostic for Communication Variables (IAC) ............................. 67
11.3.7 Tasks ............................................................................................ 68
11.3.8 SystemDiagnostics Function Block ............................................ 69
11.3.9 CPU Load ................................................................................. 70
11.3.10 Memory Consumption ............................................................... 70
11.3.11 Modulebus scan time .................................................................... 71
11.4 CEX modules ............................................................................................. 71
11.4.1 CI854 Profibus .............................................................................. 71
11.4.2 CI867 TCP IP ............................................................................... 72
11.4.3 CI868 IEC 61850 .......................................................................... 73
12 800xA for Advant Master (AC 400 Connect) ......................................................... 74
12.1 System messages at RTA boards .............................................................. 74
12.2 System messages in Advant/Master controllers ......................................... 74
12.3 System and channel load at RTA boards ................................................... 74
12.4 System and channel load in Advant/Master controllers .............................. 74
12.5 RTA Board communication statistics .......................................................... 75
12.6 MB300 OPC Server (MasterAdapter) health............................................... 76
12.7 Clock synchronization ................................................................................ 77
13 PLC Connect .......................................................................................................... 78
13.1 Collect statistics with AppLog ..................................................................... 78
13.1.1 Communication Server – GetUpdateStatistics .............................. 78
13.1.2 Communication Server – ItemInfo................................................. 79
13.1.3 Communication Server – DriverInfo .............................................. 79

13.1.4 Communication Server – RunningMode ....................................... 80
13.1.5 Select Event Server – Alarmlist .................................................... 80
13.2 Check logfiles ............................................................................................. 81
13.3 Measure time needed for “Full Deploy”....................................................... 81
13.4 CPU load and memory used by PLC Connect processes ........................... 81
14 Information Manager, IM........................................................................................ 82
14.1 System Messages from IM ......................................................................... 82
14.2 Oracle database instance health check ...................................................... 82
14.3 History configuration................................................................................... 82
14.3.1 System 800xA  IM synchronization test ..................................... 82
14.3.2 IM log database consistency test #1 ............................................. 83
14.3.3 IM log database consistency test #2 ............................................. 84
14.3.4 Entry Tables report ....................................................................... 84
14.3.5 Collection performance check and tuning ..................................... 84
14.3.6 History Backup ............................................................................. 86
15 VMware ESX - Virtual Environment ...................................................................... 87
15.1 Software version ........................................................................................ 87
15.2 VMware Tools ............................................................................................ 87
15.3 CPU count .................................................................................................. 88
15.4 RAM size .................................................................................................... 88
15.5 Network ...................................................................................................... 89
15.6 Virtual Network Adapter Types ................................................................... 90
15.7 Time Synchronization ................................................................................. 90
15.8 Automatic shutdown / startup of guests ...................................................... 91
15.9 Snapshots .................................................................................................. 92
Batch Management ....................................................................................................... 93
16 Asset Optimization ................................................................................................ 93
17 800xA for Harmony ................................................................................................ 93
18 800xA for Melody ................................................................................................... 93
19 800xA for MOD 300 ................................................................................................ 93
20 800xA for IEC61850................................................................................................ 93
21 800xA for DCI ......................................................................................................... 93
22 800xA for Freelance ............................................................................................... 93
23 Script reference...................................................................................................... 94
23.1 Run all scripts............................................................................................. 95
23.2 AC800M network throughput test ............................................................... 95
23.3 Aspect Directory synchronization and structure consistency test ................ 95
23.3.1 Aspect Directory checksum calculation using
“afwsysinfo.exe –csd”........................................................... 95
23.3.2 Structure Consistency check using “afwsct.exe” ........................... 96
23.4 File system integrity test (CHKDSK.EXE) ................................................... 96
23.5 File system test .......................................................................................... 97
23.6 Device driver version test ........................................................................... 97
23.7 Melody Connect log file test ....................................................................... 97
23.8 Computer memory test ............................................................................... 97
23.9 Microsoft network setting integrity test (NETDIAG.EXE) ............................. 98
23.10 Network settings test .................................................................................. 98
23.11 Network bandwidth test .............................................................................. 98
23.12 System locale setting test ........................................................................... 99
23.13 User locale setting test ............................................................................... 99
23.14 DNS check using NSLOOKUP.EXE ......................................................... 100
23.15 Time Synchronization configuration test ................................................... 100

23.16 Running processes test ............................................................................ 100
23.17 Registry size test ...................................................................................... 100
23.18 Running services test ............................................................................... 101
23.19 System Identifier (SID) test....................................................................... 101
23.20 Automatically started programs test ......................................................... 101
23.21 Mandatory System 800xA Third Party Software test................................. 102
23.22 Time synchronization test ......................................................................... 102
23.23 Computer uptime test ............................................................................... 102
23.24 Windows Event Log test ........................................................................... 103
23.25 Conversion tool .CSV  .XLS .................................................................. 103
23.26 AC800 MMS statistics test........................................................................ 104
1 lntroduction
The System 800xA Health Check is a procedure that has been developed with the
purpose to detect and to some extent also correct problems in System 800xA systems.
This document can be used in several ways, e.g.
• As a TODO-list when troubleshooting a system.
• As a procedure to document the health of a System 800xA installation.
Depending on the cause for the Health Check, the findings may be recorded into the
System 800xA Health Check Test record.
It is recommended that the “mission” is decided upon first before starting with the check.
Are problems to be reported only, or are problems to be resolved?
Even if this document does not require any special training apart from general skills in
Microsoft Windows and basic knowledge of System 800xA, it is recommended to attend
the ConsultIT Expert Workshop – E144 System 800xA Health Check to be able to make
the best use of the Health Check document.
1.1 ABB Service Products Data Collector (SPDC)

The data collection can be automated using the SPDC tool. Viewing of data is done via
the Service Application (a valid license may be required).
1.2 Scripted tests (legacy)

Parts of the System Health Check can be executed by scripts.
Note: The scripts are no longer maintained since release of 800xA version 6.0.
If the installation and system owner accepts scripting (most do, but there are exceptions,
e.g. if security measures have been applied blocking the scripts, e.g. disabling of certain
ports or file sharing) it is recommended that the health check is started by running the
scripts.
The health check scripts are versioned. For 64-bit based systems version 1.15 is required.
Script name Description Comment
AvailabilityCheck Node reachable test
Show_AC800MSpeed RNRP Utility throughput test with AC800M nodes
Show_AdConsistency Statistics from AfwSysInfo.exe + AfwSCT.exe check
Show_AdReplication Additional Aspect Directory statistics
Show_Checkdisk Automated CHKDSK.EXE (check mode only)
Show_CpuLoad List CPU load and processor core count in all nodes
Show_DcDiag Performs built-in diagnostics on domain controllers
Show_Disks Disk size and usage. File fragmentation check.
Show_Drivers List of all drivers and versions Obsoleted in Win
2008 and Win7
Show_MelodySysErrLog Collection of Melody Connect system error log files
Show_Memory Microsoft Windows Memory statistics
Show_NetDiag Automated NETDIAG.EXE test Obsoleted in Win
2008 and Win7
Show_NetSettings Network settings check
Show_NetSpeed Network bandwidth check
Show_NlsSystemInfo System NLS check
Show_NlsUserInfo User account NLS check Unreliable results
Doc. no. Lang. Rev. ind. Page
ABB AB PA-SE-XA-006561 en Gd4 1

Script name Description Comment

until v.1.16
Show_NsLookup Automated NSLOOKUP.exe test
Show_Ntp Time synchronization setup check
Show_Processes List of running processes and their resource usage
Show_RegistrySizes Check of Windows registry size
Show_Services List of Windows services and their states
Show_Sid SID check
Show_Startup Autostarted programs check
Show_ThirdPartySW Check for mandatory hotfixes
Show_TimeRead Time comparison
Show_UpTime System uptime
Show_VgaDrivers Lists display driver and graphic board memory size.
Show_WinEvents Collection of Windows Event Logs
Utility_Convert-Csv-Xls Utility to post convert .CSV  .XLS in case MS Excel
was not available at the node where the scripts were
run (requires MS Excel)
Utility_OPC_MMS Utility to convert AC800 OPC Server statistics into
Excel format.
Utility_OPC_MMS Utility to run all other scripts one after another
1.3 Executing the scripts (legacy)

Refer to Section 23 Script Reference for details about scripts and prerequisites.
• Unzip a distribution of the E144 scripts to temporary folder on a suitable node, e.g.
an engineering station.
• Decide if some script(s) should be excluded from being run
(edit the Utility_RunAllScripts file)
• Start the Utility_RunAllScripts script
It is recommended not to interfere or use other instances of Excel while the scripts
are running.
• Monitor the output
Script execution can be disturbed by a number of factors:
o Insufficient bandwidth between script and target computers
o Target computer “hung” or unreachable
o Lack of administrative privileges on script or target computers
o Anti-virus software blocking script execution or network traffic
• When finished, continue with the following health check chapters (and use script
output as input for decisions and remarks)

2 Software version check

Tools: Use either one of the software version checks included in the the
Diagnostic Colletion Tool (DCT) or use the System Checker Tool (also
available as standalone tool) to retrieve a complete list of installed software
from all nodes within the domain or workgroup hosting the 800xA System.
DCT  Analyze Data  Analyze Software  Collect! (all nodes)  Next
DCT Software software check
System Checker Tool software check
The standalone version of System Checker Tool is compatible with all

versions of System 800xA and is located on the SV5 DVD – it can also be
downloaded from ABB Library:
Industrial IT, 800xA System Checker Tool, 3BSE041308.

When using the System Checker tool, fast and accurate comparisons can
be made from Microsoft Excel (using macros) to spot incorrect
configurations (e.g. missing or contradicting versions of software).
Version comparison made with the System Checker Tool
Version comparison made with DCT

Various other tools can also be used to read software versions (SIW, DCT,
etc) or use Control Panel  Add/Remove Programs as a last resort.
2.1 Microsoft software check

Expected: Correct version of operating system and service packs.
Expected: Mandatory 3rd party software and Microsoft hotfixes according to System
800xA Third Party Software, 3BUA000500 are installed.
The remaining bulk of optional, but by ABB tested and approved Microsoft hotfixes are not
formally included in the System Health Check. A list of optional and certified hotfixes can
be found in the following document: System 800xA - Third Party Security Updates
Validation Status, 3BSE041902.

2.2 ABB software check

Expected: Relevant service packs and rollups available are properly
installed in all nodes where they should be installed.
Tools: Download the appropriate System 800xA System Software Versions
document from ABB Library.
Download not only the most recent version (health checking an
older system requires you to have a similarly dated version of this
document). ABB Library default is to show only latest version, but
this can be overridden in Advanced Search settings.
Expected: The AfwConfigWizard.log in the node(s) used to create the system and
load system extensions does not show any problems creating system and
loading extensions.
2.3 System Extensions check

Call the System Extensions aspect located on the domain object in the Admin Structure.
Expand the window’s height & width and/or rearrange the columns so that the important
columns Installed on this node and Successfully loaded are clearly visible.
Expected: All loaded system extensions should be installed and marked as loaded.
If a “x” is missing in the left column, software is missing/incorrectly installed in the local
node. If a “x” is missing in the right column, System Extension Maintenance has not been
carried out, e.g following an installation of a rollup or service pack.

2.4 Configure System task check (only applicable in SV 6.0)
Open the System Configuration Console, then open System Setup  Configure System
Expected: All functions are reported with Status = Deployed
Expected: All nodes are listed with Installation Status = Deployed
Expected: No information button icon (i) indicating a lower level installation problem.
Hover over the node to display a tooltip with more details.
Click on the Deployed text (link) of the node in concern, then on the View
details… button to see additional details about the problem.
The Configure System’s Diagnostics button or the View logs… command
on the System Installer Agent menu in the System Tray may retrieve logs
with additional information about what has gone wrong.
Alternatively logs can be manually retrieved from this folder and subfolders:
C:\ProgramData\ABB\800xA\SystemInstaller\...

3 Computer hardware check

Expected: All computers fulfill the PC requirements as stated in the
Product and System Guides (RAM, disk, CPU, Hyper-threading. etc.)
Expected: The hardware is approved for the selected operating system.
E.g. Windows 20xx Server on workstation hardware is not always
supported by the hardware vendor.
Expected: Operator clients running PG2 graphics should be equipped with a GPU
(graphics accelerator) to offload the main CPU. A multi-monitor client
lacking a GPU will typically use much more CPU, or even run at 100%
CPU load during navigation and callups of new graphics.
Expected: For systems making use of VMware, it is now recommended to perform the
checks listed in Chapter 15 - VMware ESX - Virtual Environment.
Tools: System Checker Tool
4 Network hardware check

Expected: The network infrastructure should comply with the installation rules made
out in the System 800xA Network Configuration, 3BSE034463Rxxxx.
In most cases, auto-negotiate is the preferred setting for all ports.
Exceptions could be made when auto-negotiation fails (often resulting in
half-duplex communication) or when the hardware run with a hardcoded
setting – e.g. the 800xA for Advant Master PU410 “RTA Board”
communication only run 100 mbit/second full-duplex.
A bandwidth measurement is recommended, problems and suboptimal
settings are likely to show up as a reduced throughput.
In doubt, perform bandwidth measurement, e.g. file transfer tests with all
possible combinations of speed and duplex – select the combination giving
the best results (considering both throughput and errors)
Document port statistics, reset the counters and revisit the port statistics at
a later time. Take action if counters are increasing too much.
Some amounts of collisions and CRC-faults are natural on half-duplex
links, but the error rate should not exceed 5% of the total amount of
packets being transmitted or received.
In hyperthreaded / multicore systems equipped with network interface
cards supporting Receive Side Scaling, RSS the Redundant Network
Routing Protocol, RNRP may detect network loops incorrectly.
If RNRP has detected network loops, first ensure that RSS is disabled
(where enabled), empty the RNRP error buffers and test again. Network
loops should be investigated.
Contact your regional ABB Support Center if assistance is required.

4.1 RNRP check
4.1.1 RNRP Network Status Tool (SV 6.0)

Launch the RNRP Network Status Tool from the Start Menu or by clicking the [R] icon in
the System Tray.
Expected: All nodes visible with (“Up”) on all configured paths

No errors or warnings.
Diagnostic counters should not list too many “messages lost”
as it indicates network traffic losses which drives “path switchover” (but
only if a redundant path exists).
It may be advisable to clear the diagnostic counters in the early stages of a health check
(to initiate a measurement period). After some time, revisit the tool and check the counters
again, after which an assessment is made on how to proceed with fault finding & repairs.

4.1.2 RNRP Monitor

Launch the RNRP Monitor from the Start Menu or by clicking the [R] icon in the System
Tray. In version 6 the new tool is launched instead – the legacy monitor can be started
from the new tool by clicking the RnrpMonitor button on the top row.
The RNRP Monitor icon in the System Tray
The RNRP Monitor

Expected: All nodes visible (“up”) on all configured paths.
No errors
A network storm may have caused ports in AC 800M with redundant network connection
to go to permanent “down/blocked” state. This is resolved in controller firmware 5.1.1-3.
Prior that version, blocked ports can be forced open by performing download of a RNRP
parameter change, e.g. Max number of remote areas. Important: such workaround must
be applied only from engineering stations connected locally to the control network –
download from a routed (e.g. client server) network may fail and require controller reset.
4.1.3 RNRP Fault Tracer

Launch the RNRP Fault Tracer from the Start menu or by double right clicking the [R] icon
in the System Tray. Execute “1 – Search own networks for nodes with configuration or
network errors”.
The RNRP Fault Tracer tool

In collapsed network configurations (using a single RNRP area) the test can be performed
in any node. If more than one RNRP area exist, perform the test at one of the RNRP
router nodes for each area (e.g. the Connectivity Servers when using AC 800M).
Expected: No errors reported by the Fault Tracer.
If errors are found, they should be attempted to be repaired and the error
log buffer should be reset with “4 – Change log conditions in one node (for
expters)  Clear log buffers” and the test should be performed again
sometime later. The log file (see next step) can give input to when the
errors emerged.
To easily reset error logs in a complete subnet, use the command: “4. Change log
conditions in one node (for experts)  Clear log buffers“ with the network’s RNRP
multicast address, e.g. 239.239.239.x
The last digit (x) is calculated as follows: RNRP Area x 4 + path (0 or 1). E.g.:
239.239.239.4 for RNRP Area 1, primary network path
239.239.239.5 for RNRP Area 1, secondary network path
239.239.239.80 for RNRP Area 20, primary network path
239.239.239.81 for RNRP Area 20, secondary network path
The “reset using multicast address trick” will only work on locally connected RNRP areas.
4.1.4 RNRP Log

The RNRP log file may contain vital information about old errors and warnings. In addition
to the regular desktop RNRP Monitor tool, the RNRP log file output contains time stamps
that can prove vital during troubleshooting.
By comparing the log files between different computers in a system, it may be possible to
draw additional conclusions (which are difficult when only viewing one node’s log file
and/or behavior).
RNRP Log file shortcuts in RNRP Wizard and Network Status Tool
The RNRP log file paths:
Ver. Path
3.0 No log file available.
4.05.0 C:\Program Files\Common Files\ABB Industrial IT\rnrp\log\RnrpEvent.log
5.1 C:\ProgramData\RnrpEvent.log
C:\ProgramData\RnrpEventOld.log
Note: Prior version 6.0, the RNRP log needs initiation by installing the RNRP icon to the
System Tray. Multiple login sessions (e.g. a server or terminal server) may interrupt
logging. One steady logged in user offers the most reliable logging, e.g. an operation
client/workstation.
The RNRP Create Icon tool

4.1.5 The “hosts” file

Path: C:\Windows\system32\drivers\etc\hosts
If enabled and correctly configured, RNRP will, as of SV 5.1 maintain the hosts file without
any need for periodic maintenance, etc. If network settings change, nodes are moved or
removed it may become necessary to clean the hosts file from obsolete records.
The Refresh hosts file button of the RNRP Setup Wizard tool
Local customization made to the hosts file may over time become obsolete and cause
problems, e.g. a custom entry is conflicting with an automated entry by name or by IP
address.
Note: RNRP will only register addresses if the Register this connection in DNS setting in
NIC Advanced settings is enabled. This setting is local, per computer, per NIC.
The RNRP Monitor displays the setting: h, = DNS registration enabled on NIC
Sometimes it can be useful to compare the hosts file side-by-side with the DNS contents
to identify obsolete or incorrect records.
The DNS and hosts file is compared “side-by-side”

4.1.6 RNRP Response Time check

Use the RNRP Fault Tracer to execute an RNRP Response Time check against each AC
800M CPU. The test verifies that controller can respond without too much delay. E.g. a
not load balanced (consecutive task execution due to lack of an adequately sized Task
Offset) AC 800M controller may show slow response time in this test. Task Offset Tuning
is then recommended.
Tools: Use the RNRP Fault Tracer (double right click the RNRP monitor icon) and
select 5 – Test rnrp response time from one node from the menu.
Expected: No timeouts (marked as + characters during the test).
4.2 System 800xA System Network Settings check

Verify that the 800xA system has been configured for the correct RNRP area. The
Client/Server network area should be entered at: Configuration Wizard  System
Administration  RNRP (SV4) / System Network (SV5)
Configuration Wizard in SV 5 – System Network settings
Use RNRP filtering – one (1) area Network address(es) for the Client/Server network
Note: other networks, e.g. such as the Control Network shall not be included in the count.

4.3 Network Adapter Bind Order vs Network Metric

Starting with Windows Server 2016 and Windows 10 (build 1709 and later), the Network
Adapter Bind Order setting has been removed. Then skip to Interface Metric below.
Prior these above versions, the order must be carefully set when configuring multiple
network adapters, or else name resolving may not work as intended. The domain network
adapter(s) must be listed first.
The Advanced Settings… menu might not show until Organize  Layout  Menu Bar has
been enabled. Once called, either of the two dialogs below will be shown:
Server 2016 and (later) Windows 10 Previous versions

No order of network adapters Each network adapter is listed in an order
 No action required  Domain network must come first in list
In Windows 10 (build 1709 and later) and Server 2016 the Interface Metric number
decides the order a name resolution request is sent out over multiple network adapters.
RNRP automatically assigns the network adapter with the lowest path and area with
Interface Metric 1. All other is set with Automatic Metric. The behavior can be adjusted on
the RNRP Wizard’s Base Parameter tab, e.g. if the domain is on a higher area number.
To view the current Interface Metric values, navigate to Network Connections (or use
StartRun… ncpa.cpl)  Network Adapter  Properties  IPv4AdvancedIP
Settings
Manual metric (governed by RNRP) Automatic metric (default)

Interface Metric influences the routing table’s Metric value which can be viewed by
executing the route print command in a Command Prompt. The routing Metric is on
the far right. Lower Metric value = higher priority.
No actions are required as long as the lowest numbered RNRP area is the network area
where the domain controllers are connected.

4.4 Network Management check

If managed switches are used, connect to each them (using the web interface, Java-
plugin, proprietary tool, etc) and examine the switch.
Verify that recommended settings are properly set (IGMP, Spanning Tree, etc) and that
the switch statistics does not contain any abnormal values, e.g. excessive CRC errors,
collisions, fragments, unexpectedly short uptime, etc.
Most managed switches have a log file that should be examined.
Note down the firmware version – some early firmwares may be equipped with known
problems and issues  check the switch’s home page on Internet. E.g.
http://www.hirschmann.com/, http://www.moxa.com, http://www.cisco.com, etc.
The following sections list some popular switch brands and items to check:

4.4.1 Port speed and duplex

Check that active ports run at maximum speed and duplex. With few exceptions should all
ports run with full duplex (some PLCs and media converters only support half duplex).
Uplinks, downlinks and ports used to connect computers should all have full duplex.
In general, auto-negotiation is preferred and recommended as a starting point for all ports.
If links known to support full duplex negotiate to half duplex or less than maximum speed
it may become necessary to take control by turning off autonegotiate and attempt a
different setting to produce the expected link results.
In most cases, use identical settings at both ends of any given communication link.
Note: Always follow up any change with checking the port statistics/counters.

4.4.2 Port statistics

Check port statistics for all ports connecting vital system equipment (servers, clients,
controllers, etc).
Port Statistics for a Hirschmann RS-30 (sorted on Detected Collisions, descending)

Ports connecting on half duplex may show some degree of errors without it being a
concern (up to 5-10% is considered “normal” at Half Duplex).
% of CRC errors = Detected CRC errors / Received Packets
% of collisions = Detected Collisions / Transmitted Packets
E.g. port 1.6 has detected 3585939 collisions, which constitutes merely 1% when
comparing with the total number of transmitted packets for the same port.
Port Configuration for a Hirschmann RS-30 (half duplex ports have manual configuration)

4.4.3 Uptime
If network problems are suspected, it is recommended to check the switch uptime. Is the
reported uptime period expected, or has the switch crashed/lost power and rebooted?
Switch Uptime in a Moxa EDS-408A
4.4.4 IGMP
The RNRP protocol does not work in networks with IGMP. Verify that IGMP is disabled.
IGMP setting in Cisco

4.4.5 Spanning Tree (STP/RSTP)

In most Ethernet networks for industrial applications STP/RSTP is not suitable due to the
relatively speaking long network tree stabilizing times following a path break/closure.
During the spanning tree stabilising period broadcast and multicast packets may loop and
flood sensitive nodes, e.g. a controller. Some brands of controllers will shut down/safety
halt if a network interface becomes flooded.
Spanning Tree setting in Dell PowerConnect

In some cases, the effects of a flood may be decreased or even circumvented by using a
port rate limiting function, “Storm Control” or “Port Based Traffic Control” available in some
switches.
STP/RSTP ports are often slower to transit from blocking to forwarding following a link up
event. This switching time can be improved by configuring a port as an “edge port” (Cisco:
“port fast”).
AdminEdge (Edge Port) setting in the ABB NeCo / Westermo WeConfig management
interface
Note: ports used for infrastructure (up/down) links should not be set as edge ports.
4.4.6 Ring redundancy (or Layer 2 redundancy) active by default

Some switch brands come delivered with ring redundancy enabled as factory default. If
the actual configuration is not a ring the ring redundancy protocol must be disabled to
allow regular use of the ports that otherwise are reserved for the ring.

Ring Redundancy setting in Hirschmann

In some Hirschmann models, the GUI settings can be overridden by hardware DIP-
switches (software settings are shown but does not have any effect!).
Refer to the switch User’s Guide on how to properly disable the Ring Redundancy.
With Hirschmann, the preferred method is to use the Delete ring configuration and Delete
coupling configuration buttons.
Remember to save (Basic SettingsLoad/Save) to make these changes persistent.
4.5 Simple network bandwidth and roundtrip time check

Expected: By measuring the transfer and ping response times during a large (> 100
MB) file transfer between two nodes it is often possible to detect network
problems or insufficiencies.
If possible, perform this check for each node.
If network is redundant and the available time allows, disconnect the
primary network cable and repeat the tests for the secondary network.
Tools: The Show_NetSpeed script can be used to perform system wide tests on
multiple nodes in a network and gather the results into Excel. For more
information see Chapter 10, Scripted system wide tests.
Due to that the test involves reading & writing to hard disk the network
bandwidth performance figure may be influenced by the write cache policy
setting on the hard disk.
Domain controllers typically have write caching disabled and will hence
produce a slower result.
Write cache enabled in Disk Management (diskmgmt.msc).


5 Domain Controller nodes

The following tests are to be performed on all nodes acting as Domain Controllers.
On Server 2000 and Server 2003, the dcdiag.exe and netdom.exe tools are not
preinstalled. Windows Server Support Tools must be downloaded or installed from the
operating system CDROM (suptools.msi)
5.1 Active Directory and domain controller redundancy health check

Expected: No errors found by dcdiag.exe
Use the Windows Support Tools command dcdiag.exe to examine the local domain
controller. To run the default analysis covering all known domain controllers within a
domain (site), run dcdiag.exe with the /a parameter.
Additionally, append “ > dcdiag.log“ to save the output to file.
C:\> dcdiag /a > dcdiag.log
One of the tests may fail if the Win32Time service is stopped, either enable it temporarily
or disregard those (expected) errors.
5.2 Flexible Single Master Operation (FSMO) roles check

Use the Windows Support Tools command netdom.exe to list the servers holding the
five (5) different FSMO roles.
C:\> netdom query fsmo
Schema owner dc1.industrial.local
Domain role owner dc1.industrial.local
PDC role dc1.industrial.local
RID pool manager dc1.industrial.local
Infrastructure owner dc1.industrial.local
The command completed successfully.
Expected: All roles shall be held by an existing & running server
5.3 Global Catalog check

Use the Control Panel  Administrative Tools  Active Directory Sites and Services tool
to verify that at least one domain controller is configured with Global Catalog. If redundant
domain controllers are configured, more than one Global Catalog server is recommended.
The Global Catalog setting in the Active Directory Sites and Services tool
Expected: At least one Global Catalog server exists (more than one is recommended)

5.4 Active Directory replication test

To ensure that the domain controllers can replicate, use the Control Panel 
Administrative Tools  Active Directory Sites and Services tool to force a manual
replication between the redundant domain controllers.
Expand the tree view until the NTDS Settings object(s) are shown. Right click the
<automatically generated> item and select Replicate Now
Force Active Directory replication using the Active Directory Sites and Services tool
Expected: No errors reported by the Active Directory Sites and Service Tool
5.5 DNS check

From the DNS management console, verify all DNS server(s) contents – normally one per
domain controller.
As of System 800xA version 5.1 the DNS function is bypassed by the hosts
file (%SystemRoot%\system32\drivers\etc\hosts). However, DNS is still
mandatory to enable a domain environment.
5.5.1 Windows Event Log  DNS

Verify that no unexplainable errors or warnings are reported.
5.5.2 Zone type and deployment

Expected: All zones:
• are of the type “Active Directory-integrated”
• are visible in all DNS servers
• have dynamic update enabled with “secure updates”

5.5.3 Forward zone

Expected: One forward zone, listing all computers in the domain with
a correct set of IP addresses (A-records).
 v5.0: Nodes should only be listed with their primary network
connection’s IP address.
v.5.1  Multiple IP-addresses (A-records) is no longer an issue.
For older systems a single IP-address per node is required, read more in
Microsoft Knowledge Base article: KB246804 - Dynamic DNS registrations
5.5.4 Reverse zone, primary plant network

Note: Reverse zones are not mandatory in System 800xA version 5.1 and later.
Expected: The reverse entries (PTR records) are listed with the correct Fully Qualified
Domain Name, FQDN for all nodes.
5.5.5 Reverse zone, secondary plant network

Note: Reverse zones are not mandatory in System 800xA version 5.1 and later.
Expected: The reverse entries (PTR records) are listed with the correct Fully Qualified
Domain Name, FQDN for all nodes.
5.5.6 Standalone domain controllers

Domain controllers that run isolated domains should be configured with the following:
5.5.6.1 DNS is running as “root server”

Expected: If the domain is isolated from other networks, the DNS shall be configured
as a root server (=contain an empty forward zone named “.”) to prevent
delays and timeouts when trying to resolve unavailable root hints (top level
DNS servers on Internet).
A computer may contain various software trying to reach Internet to
perform updates, check, etc. A root zone will quickly terminate any external
name resolution and return a “No such zone” DNS response.
5.5.6.2 Missing (domain-server-self) PTR records may cause an error in nslookup and other places.
C:\> nslookup
Default Server: UnKnown
Address: 172.16.4.1
To prevent this error, add a reverse zone for the client/server-network and execute
“ipconfig /registerdns” on each domain controller. If no records appear
automatically, insert PTR records for the domain controller addresses. Verify that
nslookup launches without error.
C: \> nslookup
Default Server: servername.domainname.topleveldomainname
Address: 172.16.4.1
5.5.7 Forwarders
If name translation towards external networks is desired, one or more Forwarders should
be configured to “escalate” queries to a DNS with knowledge about these names.

Example of a forwarder to a DNS server on an external network, 192.168.1.254

Note: Do not forward between redundant domain controllers – this is not necessary as
they share the same database, the Active Directory.
5.5.8 DNS self-test

For each DNS server, perform the built-in self test from the DNS Management Console.
Expected: All tests shall pass.

The recursive test is not applicable for DNS that are configured as root
servers.
5.6 Windows Event Log check

Expected: No errors or warnings without a reasonable explanation.

5.7 Hard disk check

Expected: The disk shall not be full, or close to be full. No partition should have more
than 75% used disk space. Use the disk defragmenter to verify the file
fragmentation status (SSD disks are to be exempted).
5.8 Login environment

Verify that the correct time zone is selected.
5.9 Network Adapter Bind Order, DNS settings and name lookup
Expected: Note: this item does not apply to Server 2016 and later Win10 builds.
Refer to chapter 4.3 - Network Adapter Bind Order vs Network Metric
The network adapter connecting the system’s DNS should be listed first.
External networks should be listed last.
Expected: Preferred and Alternate DNS server specified as recommended in the

Automation System Network Design and Configuration User’s Guide.
Singular domain controller nodes should have the Preferred DNS server
setting pointing to themselves; either directly or via the loopback address:
(127.0.0.1).
Redundant domain controllers may have the Preferred DNS server setting
pointing at the redundant peer – this often make the domain controller
services startup faster since the peer’s DNS can be queried immediately.
Expected: Unless special conditions apply, no DNS suffixes, etc. should be
configured in the Advanced DNS configuration.
Advanced DNS Configuration - Default settings

Expected: DNS queries are answered immediately without time-out when attempted
with the nslookup.exe tool.
The order of preferred and alternate DNS servers on secondary network cards were in
older 800xA versions to be swapped to speed up name resolution during abnormal
running situations. The current recommendation is to not swap them.

If the time allows or there are suspicions that DNS may not work correctly; probe all
available DNS servers on the network (i.e. also the alternate servers) by using the
“server <IP address>” command inside nslookup to override the default Bind Order.
It is recommended (in at least one of the DNS client nodes) to verify the availability of all
zone types: forward, primary reverse and secondary reverse (if redundant network is
used).
In Microsoft Windows XP / Server 2003 and older, additional checks can (if networking
problems are suspected) be performed with the netdiag.exe tool that is part of the
Windows Support Tools (SUPTOOLS.MSI) on the operating system CDROM.
In more recent versions, the factory default nltest.exe tool can be used to query
various statuses, e.g. the computer’s DC Secure Channel status from an elevated (run as
Administrator) Command Prompt:
C:\Windows\system32> nltest /sc_verify:domain.tld
Flags: 90 HAS_IP
Trusted DC Name \\dc.domain.tld
Trusted DC Connection Status Status = 0 0x0 NERR_Success
Trust Verification Status = 0 0x0 NERR_Success
The command completed successfully
For more info issue the command: “nltest /?”
6 Computers with System 800xA
6.1 Aspect Servers

The following tests are to be performed on all nodes acting as Aspect Servers.
6.1.1 Windows Event Log check

6.1.2 Hard disk check

For performance reasons, the OperateITData and/or OperateITTemp folders could be
relocated on something else but C:\.
As of SV 5.1 the data folders can be managed & relocated by the System Configuration
Console toolClients and ServersSystem Directory Configuration
But even before SV 5.1 it is still possible to move the folders after the 800xA processes
have been stopped (Configuration Wizard  Maintenance  Stop all processes…)
The new path must be updated in Windows Registry:
HKLM\Software\ABB\AFW\Systems\{System GUID}
After moving the 800xA data files, the FSD Cache folder (\OperateITData\temp\FsdCache or
\OperateITTemp\FsdCache) must be removed to force a re-registration of all Visual Basic
related graphics at the new location on disk

6.1.3 User and System locales (Regional Settings)

Expected: All accounts except operators must use English (United States) region.
Expected: All accounts must use dot (.) as decimal symbol.
Windows 10
Windows Server 2016
In older versions, there was sometimes a need to make sure internal accounts were
aligned by clicking a button “Copy to reserved accounts” present in the above dialog.
used).

6.2 Connectivity Servers

The following tests are to be performed on all nodes acting as Connectivity Servers.


related graphics at the new location on disk.

Windows 10
Windows Server 2016

used).
6.2.5 Connected controllers

Each controller family may have a separate check procedure later in this document.
Expected: All controllers are properly connected.
Expected: Redundancy, if enabled – the service groups for OPC DA, Event Collector,
Basic History, etc. must be configured with one service provider for each
Connectivity Server.
6.3 Application Servers

The following tests are to be performed on all Application Server nodes (Information
Management, Batch- and Asset Optimization Servers, etc.)


related graphics at the new location on disk.


Windows 10
Windows Server 2016
used).
6.3.5 Application Server specific tasks

Each type of Application Server may have a separate check procedure later in this
document.

6.4 Clients


than 75% used disk space.Use the disk defragmenter to verify the file

Windows 10
Windows Server 2016

used).
7 Application check
The application check is to be performed once for each system. The node used for the
tests can be chosen arbitrarily.
7.1 Process Portal A – System Event List

List location: [Workplace Structure]Web System Workplace:System Event List
This check requires some knowledge about warning messages from the
800xA platform.
Common problems:
• Status Poll timeout indicating a hung/frozen service.
• Guest logins (services or computers accessing the system from unauthorized user
accounts). This is not desired, not even if guest logins are enabled.
• Restarting services, or services not configured properly
• “Installation Problem in …” – a System Extension loaded into the system has not
been installed to one or more nodes.
7.2 Process Portal A – Operator Message List

List location: [Workplace Structure]Web System Workplace:Operator Message List
Common problems:
• Incorrect graphics causing run-time errors
• OPC DA property write failures due to too many or too frequent writing from 3:rd
party clients or the Property Transfer service, etc.
7.3 Service Structure

Expected: All services configured and running without disturbances.
The Start Time column can be used to identify services that have restarted
due to a fault (normally, all services within a node should have a similar
Start Time).
Redundant service providers shall share the same service group.
Unused Service Groups can be deleted to simplify the system setup (Alarm
Logger, External Alarm, Property Transfer, etc).
Multiple service groups within the same service should be renamed from
the default name “Basic” to something more readable, e.g.AC 800M
Control Network 1, MB300 Network 11, etc.
Tools: AfwServiceStatus.exe

7.4 Data Source Definition aspects

Expected: Each Data Source definition should point to the correct
service group in the Service Structure.
In configurations hosting multiple connectivity server pairs it is recommended to check
that e.g. the History Source configuration is made “straight” and not “crossed”. E.g. the
connectivity server pair CS1A and CS1B is logging the CS2A and CS2B pair’s objects
because their History Source is incorrectly pointing to the CS1x pair’s Basic History
Service Group.
The number of data source definitions depends on which system extensions that are
loaded and used:
System Extensions and their data source aspects (example):
System Extension Data Source Definition Aspect name
Basic History History Source
AC 800 Connect OPC Data Source Definition
AC 400 Connect Adapter Data Source Definition
TTD Source
Quick List Data Source
SoftPoint Server Source Definition
Adapter Data Source Definition
PLC Connect Source Definition
Adapter Data Source Definition
Asset Optimzation AM Service Data Source Definition
PNSM OPC Data Source Definition
PPA OPC Data Source Definition
Profibus / HART FBB-OPC Data Source Definition
Fieldbus Foundation OPC Data Source Definition
Hint: Use the Find Tool to search for these aspects. Sometimes, additional
(erroneous) Data Source Definition aspects have been created and
prevents data subscription, history logging, etc.

7.5 Graphic Performance
7.5.1 Diagnostics Window

Errors in the configuration (e.g. broken reference) or during runtime (e.g. property value
out-of-range or of incorrect type) has a negative impact on performance. The errors can
be examined by the Process Graphics Diagnostics.
The Diagnostics window is available at all graphic aspects by right clicking them. Even
faceplates can be examined in the same way by calling the Diagnostics window from the
individual faceplate elements instead of from the faceplate itself.
Diagnostics command at the context menu of a graphic aspect
7.5.1.1 New Graphics (PG2) Diagnostic Window
The New Graphics (PG2) Diagnostics window

Expected: Zero errors in the Errors & Warnings section.
Hint: Each error is listed with more details in the Errors & Warnings tab
Expected: “Acceptable” figures in the Timing section (typically less than 5 seconds)

7.5.1.2 Visual Basic (VBPG) Diagnostic Window
The PG1 (Visual Basic) Diagnostics window

Expected: Zero errors in the Error Message Overview.
Hint: Each error is listed with more details in the Error messages tab.
Expected: “Acceptable” figures in the timing summary (typically less than 5 seconds)
7.5.2 New Graphics (PG2) graphics checks
7.5.2.1 Late Binding

Late Binding makes it possible to evaluate and redefine data subscriptions during runtime.
E.g. a graphic display or element can subscribe to different properties depending on e.g.
batch state, sequence step, equipment mode, etc.
However, Late Binding comes with a significant increase in display exchange overhead
since no subscription caching can be made in between callups. Hence, the use of Late
Binding should be kept to a minimum and reserved to situations where it is really required.
Example of Timing tab when using Late Binding
7.5.2.2 Rendering Tier

Operator clients running PG2 graphics should be equipped with a GPU (graphics
accelerator) to offload the main CPU. A multi-monitor client lacking a GPU will typically
use much more CPU time, or even run at 100% CPU load during navigation and callups of
new graphics. A saturated CPU results in less performance in process graphics.
In contrast to Visual Basic 6, PG2 will make use of hardware graphic accelerators when
available. One occasion where acceleration is disabled is in Remote Desktop sessions
(used by Thin Clients). If workplace performance is reported as bad it is recommended to
check if hardware acceleration is enabled.

Graphics hardware acceleration is unavailable (e.g. via Remote Desktop)

If possible, call up the same graphics on a “thick” client equipped with a graphics card
enabling hardware acceleration.
Graphics hardware acceleration enabled to Rendering Tier level 2

Graphic displays making heavy use of animated controls or effects, embedded trends, etc.
may show a significant increase in processor load for the AfwWorkplaceApplication.exe
process when run without hardware acceleration.
Hence, it is recommended to adhere to a “minimalistic” graphic design when the target
operator workplace is to be run in thin clients or via Remote Desktop where hardware
acceleration is unavailable.
7.5.3 Visual Basic (VBPG) graphics checks
7.5.3.1 Graphics are sized to match current screen resolution

For best performance, the VB graphic displays should be deployed with the actual size
they will be viewed with. Scaling and resizing during runtime significantly reduces display
exchange performance.
Some Windows desktop settings have impact on the number of pixels made available to
the workplace:
• Auto Hide of the Task Bar

• Classic or “XP style” Start Menu
• Workplace Mode
To make best use of the available screen area, it’s recommended to enable Full
Screen or Operator Workplace Mode for the operator users. Operator Workplace
mode will imply Full Screen and prevent minimizing, stacking and off screen
placement of overlapping windows.
When the desktop environment has been configured (taskbar, start menu, etc) it’s time to
measure the effective size of the workplace graphics panel).
Start a workplace (preferably in Operator Mode) and call up the Size Display (it’s placed
on the “Special” object in the Graphics Structure).
Then re-deploy all graphics with the correct size set.
7.5.3.2 Settings with influence on performance

To reduce the CPU load created in the workplace, it’s recommended to configure the
displays with the following settings:

• EnableBlink = False
(or use an expression that enables blinking only when needed)
• Backstyle = Transparent
Windowless = True
- or -
Backstyle = Opaque
Windowless = False
• EnableInput = False
(or use an expression that enables input only when needed)
More information is to be found in the Graphic Engineering User’s Guide and some FAQ
documents on ABB Library, e.g. How to decrease Graphic Display call up time
(3BSE034711Rxxxx).
Lowering the CPU load will increase performance and cut back on display exchange
times.
7.5.3.3 Search for graphics with unresolved dependencies

Use the StartProgramsABB Industrial IT 800xAEngineeringUtilitiesDisplay Tool
to search for graphics that need deployment.
The search can be made in a number of ways, all graphics, per library, per structure, etc.
As a minimum effort, it is suggested to search the displays used by the operators, they are
usually located in the Functional Structure.
The Display Tool has found one display in need of a “Deploy” in the Functional Structure
Displays requiring deploy may show incorrect values, indications, etc. and should be
reported in the test protocol as a possible problem.
7.5.4 Workplace memory usage

Use Windows Performance Monitor, PowerShell or the Show_Processes script (see
chapter 23.16 Running processes test) to measure the memory usage of each workplace
process (afwworkplaceapplication.exe).

An advanced option (requiring AO_NET_MON license) is to setup WMI counters to track

key performance counters in the workplaces (CPU, memory, etc.). The counters can then
be logged with a Log Configuration and/or used to drive Alarm Expressions.
Expected:
• 32-bit operating system: At no time shall 1500 MB virtual bytes be exceeded
• 64-bit operating system: At no time shall 3500 MB virtual bytes be exceeded
7.6 User roles and security permissions

Open the [Admin Structure]Administrative Objects/Domains/<system name>:Security
Report aspect, click Update. Copy & store the result in an .RTF file for archival.
Expected: (Preferably) all engineering and operation should be made from non-
administrator accounts – only a very few accounts should have to be
member of the Administrators group in the 800xA System and Microsoft
Windows Domain/Workgroup since most of the work in the system does
not require administrative privileges.
Expected: All user accounts have been assigned to appropriate user groups in the
800xA System - mainly they are divided into three categories: engineers,
operators and administrators.
Hint: being member of the Administrators group in the 800xA System’s
User Structure disables all security permissions.
Expected: No Guest account
The Guest account is in most cases not a necessary user.It is

recommended that it is removed to improve system hardening.
Expected: The system security has been configured as required by the application
(this requires both in-depth understanding of the application and
knowledge about how the installed object type libraries are handling
security).
Generally, it is better practice to use “partial allow” rather than “global allow” + “local
deny”. This method prevents the security setup to become cluttered with multiple “denies”
in multiple locations – this makes it harder to understand the security concept.
E.g. if “operate” is required by some users on a few branches in the Control Structure, it is
better to remove “operate” from the Default Security Configuration Aspect in the Admin
Structure and to create/place local “allows” per user on lower levels where the access is
required.
Property Attribute Override aspects may override the default OPC DA property security
requirements that usually are defined on the object type. Hence, it may be a good idea to
search for Property Attribute Override aspects placed in the Object Type and Control
Structures. If any are found, it is recommended that the location of them is included in the
security check. Bulk Data Manager can be used to create a list of these aspects.

7.7 Objects in Lost and Found

Objects in Lost and Found indicate a mismatch between the controller configuration and
the Control Structure. If an alarm or event is received and no match is found, an object is
created in Lost and Found which will be associated with the alarm or event.
Expected: No objects in Lost and Found.
Alarm and event lists filtering on Objects and Descendants will in most
cases exclude objects in Lost and Found which can lead to lack of alarm
presentation, etc.
During troubleshooting, pay attention to the Creation Time of the Lost & Found objects. It
indicates when in time there was a need to create that particular object.
In most cases a controller “upload” or “download” is required to synchronize the HMI and
PLC environments. After the Control Structure has been synchronized with the controller,
the Lost and Found objects should be deleted.
Instead of deleting the Lost and Found object it is possible to remove the unwanted child
objects by exporting the Lost and Found object without children, delete it (incl. children)
and finally import Lost and Found again.
SV5.x hint All L&F objects can be deleted with the AfwSCT.exe tool:
C:\> afwsct –lf -q
If L&F objects return the root cause could be relocated and corrected, e.g:
1. Application not uploaded or downloaded properly…
 Upload/Download
2. Obsolete alarms lingering in the Alarm & Event (SV3) or Event Collector services
(SV4)
 Restart Alarm & Event or Event Collector services
3. Obsolete alarms lingering in the OPC AE server…
 Restart OPC AE Server
4. Obsolete alarms lingering in the controller…
 Restart controller (first warm, then possibly also cold)
The choice in Source Object Interceptor (see chapter 8.6.2) is influencing if a restart of
the Event Collector is required or not after removal of lost and found objects.

7.8 Consistency Check

To avoid corrupt backups and ease application transfer and upgrades it is recommended
to regularly perform consistency checks on user defined object type libraries and control
applications.
In versions prior to System 800xA version 5, the Consistency Check is carried out from
the Consistency Checker aspect. The aspect can be created on an arbitrary object and
configured to check any object or tree of objects.
The SV4 Consistency Checker aspect

As of System 800xA version 5 and forward Consistency Check is carried out via a
dedicated tool.
The SV5 Consistency Check tool launch button
SV5 Consistency Check tool

Note: For large configurations, it is recommended to check consistency in several smaller
& separate steps (one library at a time, one application at a time, etc.).
For more information refer to the Consistency Check Guideline documents at ABB Library
(be sure to download the document matching the concerned 800xA system version)

7.8.1 Consistency Check – User defined object type libraries

To be performed for each user defined library that is in use. The version(s) currently in
use can be examined at the Control IT Project or Application aspects
Run the check from the consistency tab of the library’s Library Version Definition aspect or
from the Consistency Check tool (available in SV5 and forward). Be sure to enable all
checkboxes except for “Verbose” if using the aspect checker variant.
Expected: No consistency errors
SV4.0: Check failed. The library have consistency issues where some may be possible to
repairable by the tool itself
SV5.x: Check failed. The library have consistency issues where some may be possible to
repairable by the tool itself

7.8.2 Consistency Check – Library Aspect check

This check is suitable for systems where application library versions are created and
maintained and less suitable for systems where no library development takes place.
Search for unassigned aspects from the Aspects tab of the library’s Library Version
Definition aspect.
Expected: No unassigned aspects = nothing in the right list box after the
search with Aspects not included in any library checked is finished.
The above is valid in most configurations, but exceptions do exist – e.g. in
systems using Function Designer some aspects may be kept unassigned.
7.8.3 Consistency Check – Control Structure

After the libraries has been checked for consistency (and possibly) corruptions been
repaired, it is time to check the Control Structure.
In most cases it is advisable to split the check so that each Control Project, OPC network,
etc. is checked individually (or else the output may not be easy to overview).
In the following example of a Control Structure
Example of Control Structure

The check could be split into several smaller parts (marked with color)

Adding parts of Control Structure to the Consistency Check tool

7.8.4 Consistency Check – other structures

After the Object Type and Control Structures have been examined it is time to check the
other structures where an application has been developed. E.g. the Functional Structure
usually contains the process graphics.
7.8.5 Consistency Check – internal data structures

The internal data structures of System 800xA should be examined with a dedicated
Structure Consistency Check tool: afwsct.exe
The tool can be accessed from the Command Prompt.
The Structure Consistency Tool will lookup misplaced objects. It can also attempt to repair
missing or corrupted structure references (advanced usage after receiving instructions
from an ABB Support Center).
SV3 & SV4 Per default only the Control Structure is checked. To check another
structure, use the –s <structure name> option. A minimal test should
at least include the Object Type and Control and Node Administration
Structures:
C:\> afwsct –s ”Object Type Structure”
SCT succeeded
C:\> afwsct –s ”Control Structure”
SCT succeeded
C:\> afwsct –s ”Node Administration Structure”
SCT succeeded
SV5  Per default are all structures iterated and tested.
C:\> afwsct
Checking structure 'Workplace Structure'
…
Checking structure 'Admin Structure'
SCT succeeded
Contact an ABB Support Center for further assistance if the afwsct.exe tool reports any
errors.

7.9 System NLS check

Use the NLS check tool to retrieve a list of objects and aspects having ambiguous names.
“Ambiguous” usually means that a name has multiple translations with different “strings”
per language. This is OK for some aspects and objects (e.g. the name of a container
object in the Functional or Location Structures), but is not OK with aspects and objects
associated with engineering, e.g. a Control Module instance inside a CBM project.
The NLS check tool can be downloaded from ABB Library (be sure to install the correct
version).
800xA Operations SV 3.x, 4.x, 5.x Clean up of unintended Multi-Lingual
Engineering using the AfwNlsCorrection tool, 3BSE042291
Note: at the time of writing, SV 5.1 does not (yet) have any tool made for it.
In normal cases two checks are recommended:
7.9.1 Test of Control Builder Name aspects

C:\TEMP\> afwnlscorrection
Total Number of Control Builder Name Aspects: 1414
Total Number of Control Builder Name Aspects with Single Language: 1409
Total Number of Control Builder Name Aspects with Multiple Languages: 0(same:0)(not same:0)
Total Number of Control Builder Name Aspects with Neutral Language: 441
Languages used in system:
lcid: 0
lcid: 1033
Expected: Zero Control Builder Name Aspects with multiple languages.
7.9.2 Test of Plant Explorer Name aspects

C:\TEMP\> afwnlscorrection bpn
Will make a report of the basic property name aspect type (i.e. all name aspect categories)
…
Total Number of basic property name Aspects: 17632
Total Number of BPName Type asp. with Single Language: 15338
Total Number of BPName Type asp. with Multiple Languages: 2280(same:2270)(not same:0)
Total Number of BPName Type asp. with Neutral Language: 2340
Languages used in system:
lcid: 0
lcid: 1031 =German
lcid: 1033 =English (United States)
lcid: 1044 =Norwegian
lcid: 1053 =Swedish
Expected: Zero BPName Type aspects with NOT SAME name.

Some localization kits introduce translations for common objects, e.g. the “Plant Explorer”
may be translated into “Fabriksutforskare” (in Swedish) if the Swedish Localization kit for
System 800xA is installed. Such translations are deliberate and considered “safe”.
8 Other tests
8.1 Affinity
The affinity configuration should be controlled in all systems with redundant servers.
It’s recommended that no pair of adjacent operator clients should share the same server
(its better if they use separate servers in case one server should fail, operate slowly,
deliver corrupt data, etc.).
Redundant application servers (e.g. IM) should be configured to collect data from different
servers within redundant service groups for History, Data Access and Event Storage.
Load balancing is automatic even without using affinity, but should be considered since
large clients may inadvertently gather on the same server causing an uneven load
situtation.

Use appropriate tools (e.g. AfwAppLogViewer.exeAdvDsOpcConnectorStatistics

operation, AC 800M OPC Server Panel, AC 400 RTA Board ANPER, etc.) to judge if
client load is skewed and ought to be adjusted to become well balanced across redundant
peers.
8.2 Aspect Directory service health
8.2.1 Aspect Directory synchronization

Expected: Aspect Directory is synchronized.
To verify that the aspect servers are properly synchronized it is possible to use the
afwsysinfo.exe tool.
Check synchronization with this command:
c:>afwsysinfo.exe –csd
No differences found
If any differences are listed, re-run the command a few times. The checksum is calculated
online/sequenced and not offline/parallel, if an application (e.g. CBM, ImportExport,
Property Transfer to General Properties) is frequently writing to the aspect servers the
checksum might differ for natural reasons.
If a permanent difference is established, use the Plant Explorer Find Tool to search for the
object GUID listed as not synchronized. Compare the list of aspects on the object
between the aspect servers; check the modification date to possibly identify the not
synchronized data.
8.2.2 Master vs slave

Use Plant Explorer and navigate to [Service Structure]Services/Aspect Directory and
select the Service Group object.
Select the Service Group Definiton aspect and the Configuration tab.
The first listed service provider is “master” – the order may be influenced by Restore
System, Add/Remove Node, etc. It is recommended to have the lowest numbered node
(etc) as master. Reorder the service providers necessary.
In 1oo2 aspect server operation, the first listed service provider will overwrite the second
provider in case they individually become updated during a communication break. Any
work that has been made in clients of the secondary will then go lost.

8.2.3 Frequency of transactions

Use Plant Explorer and navigate to [Service Structure]Services/Aspect Directory and
iterate all Service Provider objects.
Select the Service Provider Status aspect, select the Property View tab and enable
Subscribe for live data and verify the following items:
Item Description and expected value
ClientConnectionCount Number connected client processes. An uneven distribution can
be tweaked using affinity.
ClientNodeCount Number of connected client nodes. Redundant providers should
not deviate too much, or else affinity might need adjustments.
TransactionRateCurrent Number of aspect directory transactions (writes) per second.
Should be zero (0), or very close during normal operation. Bad
practice, e.g. frequent writing to General Properties drives this
number.
8.3 Alarm Manager service health

Check the contents of the following aspect:
[Service Structure]Services/Alarm Manager/Basic:Service Group Definition aspect
The Alarm Manager service default setting is to not create new entries for repeating
alarms and that all alarm categories share 10.000 queue slots in First-In-First-Out (FIFO)
manner, regardless of the alarm category (origin).
When the FIFO storage is depleted the following system message is emitted and possible
to view in the [Workplace Structure]Web System Workplace:System Event List:
To eliminate the risk that less important (and often frequently reoccurring) system alarms
flood the alarm manager and pushes the (often) more important process alarms out of the
FIFO storage it is possible (and recommended) to:

1. Keep the Make new alarm entry each time a condition gets active at its default
“Disabled” setting. This prevents repeating alarms from occupying the alarm
storage.
2. Define a dedicated storage queue for system alarm and/or process alarm
categories.
At the bottom of the Special Configuration tab, click the Edit button.
Default settings (all categories in Auto) Example of tuned setting

All system alarm categories set to 500
In the above right example, the system alarm categories’ queue size settings have been
altered from Auto to 500 (one complete alarm list page).
This will result in that process alarms share the 10.000 FIFO queue and the system alarm
categories get 500 each. In total, 11.500 alarms are stored in the Alarm Manager queues.
Note: The number and names of the alarm categories varies heavily depending on what
system extensions that have been loaded.
To review the current alarm storage situation, use the AfwAppLogViewer.exe tool to
execute the ListAlarmSize operation on an arbitrary Alarm Manager service provider.
8.4 OPC DA Connector service health
8.4.1 OPC DA analysis

Use the AfwAppLogViewer.exe tool to execute the AdvDsOPCConnector 
AdvDsOPCConnector Statistics operation for each connected OPC Server.

Save the operation results, wait one hour and repeat the operation and save the results
again using a new file name.
For each OPC Server, record the following in the test protocol:
1. Total number of clients (if found appropriate, also record client details)
2. Total number of subscribed items
3. Total number of accumulated read and write operations
Using Excel and simple math, the above data can easily be converted into KPIs and used
to baseline a system for later comparisons, etc.
• Number of item changes per second
• Number of read operations per second
• Number of write operations per second
Expected: The application load related figures shall not be skewed too much within a
redundant pair of connectivity servers. Unbalanced figures may indicate a
need of adjustment (or even deployment) of Affinity. Unusually low (or
zero!) figures may indicate configuration errors or lack of communication.

8.4.2 Recovery Items
Note: not applicable for PLC Connect, SoftPoint or Advant/MOD300

Use the AfwAppLogViewer.exe tool to execute the AdvDsOPCConnector 
AdvDsOPCAdapter  Statistics operation for each connected OPC Server.
The OPC Adapter statistics operation lists the number of times a recovery item has been
added or removed and ends with a sum of OPC items presently in recovery state.
Expected: Zero (0) items in recovery state (at present, but preferably also in the past)
A Recovery Item is proof of an attempt to subscribe for a non-existent OPC item.
Typically, they originate from erroneous or obsolete configurations, e.g. older (obsoleted?)
process graphics or Log Configurations.
Recovery items add extra burden to the OPC server since they are perpetually retried
every 30 seconds.
Use the DumpRecoveryItems operation to produce a list of all items in recovery state.

8.5 Basic History service health

Use Plant Explorer and navigate to [Service Structure]Services/Basic History and iterate
all Service Status Objects located below each service provider.

ActiveAdviseRequests Number of trend subscriptions. Two redundant providers should
not deviate too much, or else affinity might need adjustments.
Must not reach very high or a constantly increasing value (e.g.
indicating a client with too many subscriptions or a “leak”)
LogMgrQueueLength Number of pending log file reads. Should be zero (0) most of the
time. If not, it may indicate an overloaded service, e.g. due to too
many clients, ActiveAdviseRequests or hardware issue (slow
harddisk, poor I/O performance, synchronization problem, etc.).
DirectLogs and Number of primary logs. Should be equal, or else some log is
EnabledDirectLogs inactive. Inactive logs prevent synchronizing to 100%.
Logs and Number of secondary logs. Should be equal, or else some log is
EnabledLogs, inactive. Inactive logs prevent synchronizing to 100%.
PercentSynchronized Only valid for redundant providers. Should reach 100% some
time after service start. If not, verify that all logs are active (see
previous two items). Consistency errors may prevent
synchronization of a log. The Log Summary aspect will list all
logs on descendant objects and permit activation of all logs in its
scope.
AfwApplogViewer has an operation to list logs with problems:
AdvHtHistorySrvAdvHtHistorySrv (collection apartment
operations)ListCollectorMap. Search for ”Pending logs"

The LogMgrQueueLength may be influenced by disk fragmentation and e.g. an

application server performing its data collection, e.g. an Information Manager (IM).
The IM is equipped with a tool (called the “Stagger”, see chapter 14.3.5 Collection
performance check and tuning) that can split up and spread the data collection to improve
performance. Below an example of the outcome
Before defragmenting Basic History data storage and applying IM’s “stagger”
Periodic LogMgrQueueLength peaks seen when 2500 one-second IM logs collect at once
During the peaks (that coincides with IM-collection), trend callup takes several seconds.
After defragmenting the Basic History data disk and applying stagger to the IM
The queue peaks are removed and overall Basic History performance is improved
Trend callup time is significantly improved (typically less than a second) while IM retrieves
the same amount of raw data (but in smaller chunks well spread over time).

8.6 Event Collector service health
8.6.1 Discarded alarm/events

Use Plant Explorer and navigate to [Service Structure]Services/Event Collector and iterate
all Service Status Objects located below each service provider. Focus on the service
groups for controllers and 3rd party OPC AE connections.
DiscardedAlarms Number of alarm or events that has been discarded
(suppressed). Should be zero. Examples reasons to discard:
• Unknown category
• SourceName cannot be found (no matching object in system)
• Bad timestamp
AfwApplogViewer has an operation that can be used to list the
last 50 discarded events: AdvAeEventCollectorEvent Collector
ServerDump Discarded Events.
8.6.2 Source Object Handling

OPC AE servers in need of the “Lost and Found” feature (which automatically creates
temporary objects for unknown SourceNames) e.g. 800xA for Advant Master should make
use of the more modern Tracking Source Object Interceptor over the older Default Source
Object Interceptor.
The Tracking variant automatically purges old alarm references once a proper source
object has been, e.g. uploaded and the temporary object removed from Lost & Found.
The Default variant also require a restart of the Event Collector Service group. Old alarm
references may confuse operators when alarms do not associate where expected.
Other OPC AE servers dependent on the automatic creation of objects for missing
SourceNames (=Lost & Found) should have the same setting.

8.7 General process health

Use necessary means (SPDC, scripts, PowerShell, etc) to retrieve a list of all running
processes in all machines. Below an example how to list the top 5 virtual memory using
processes from a remote computer.
C:\> powershell
PS C:\> Get-Process -computer cl71 | sort virtualmemorysize -descending | select
name, virtualmemorysize -first 5
Name VirtualMemorySize
---- -----------------
svchost 1422163968
AfwWorkplaceApplication 820727808
ABB.xA.SystemInstaller.AgentTrayApp 700325888
svchost 695345152
svchost 617144320
Most of the System 800xA binaries are still compiled for 32-bit architecture or in some
cases WOW64 architecture limiting their maximum virtual memory usage to 2.0
respectively 4.0 GB.
Expected: 32-bit processes below 1.5 GB (=500 MB left over)
64-bit (WOW64) processes below 3.5 GB (=500 MB left over)
Some processes, e.g. oracle.exe may max out its usage (e.g. very near 2.0 GB) without
causing any issue. Pay attention to growing processes, or processes with large deviations
between nodes of the same type, e.g. afwworkplaceapplication.exe among client nodes.
8.8 Windows Firewall

An incorrectly configured firewall may cause connection problems (too few exceptions) or
undesired exposure to malware (too many exceptions).
Version Firewall support Configuration
1.0 … 5.0 No N/A
5.1 Yes “Semi-Automated” via wizard on 800xA media
6.0 Yes Fully governed by SCC Configure System Task
System 800xA firewall handling tools add exceptions valid for “all” network profiles:
domain, private and public. An “unidentified” network connection may block valid traffic.
Expected: For proper firewall operation, the Network Interfaces must belong to the
correct Network Category: Domain, Private or Public.
StartRun… ncpa.cpl
One method to identify problems with the firewall is to enable Firewall Logging – blocked
telegrams will then be logged to a logfile (separate log setting for each profile).

8.9 File fragmenting

The OperateITData and OperateITTemp folders on all nodes should be defragmented on
a regular basis.
It is not recommended to perform scheduled defragmentation, the defragging should be
pulled when so required manually by a system administrator when the system situation is
“calm”.
The defragmentation can be run as an analyze only, the analyze report can then be used
to decide if defragmentation is required (e.g. if important files in the OperateITData folders
are heavily fragmented).
The system performance may degrade during the defragmentation operation, for best and
secure results: stop all processes associated with Process Portal A (from the
Configuration WizardMaintenance dialog), IM, etc. before defragging.
Drive imaging software (Ghost, Acronis, etc.) can also be used to backup and restore the
same disk. This usually results in defragmentation of fragmented files.
How much slowdown a fragmented disk causes depends on many factors such as
caching made by the operating system, disk seek time, maximum transfer rate, etc.
For performance reasons the OperateITData and OperateITTemp directories could be
relocated on a dedicated disk partition other than C:\. For optimal performance high-end
use Solid State disk like NVMe or SSD), 15.000 RPM “SAS” or “UltraSCSI320” disks.
RAID controllers shall have their read & write caches enabled (which often require a
separate power backup like a battery or capacitor).

Seagate 160 GB, 15kRPM SAS Samsung 160GB 7.2kRPM SATA-300/NCQ
Average transfer rate: 108 MB / second Average transfer rate: 49 MB / second

Average seek time: 5.7 ms Average seek time: 17.2 ms
Intel 160 GB SATA2 SSD Samsung 512 GB NVMe
Average transfer rate: 186 MB / second Average transfer rate: 1086 MB / second
Average seek time: 0.1 ms Average seek time: 0.1 ms
8.10 Time Synchronization

It is recommended to have a controller, e.g. an AC 800M as clock master (with backups
running in other controllers). The AC 800M can synchronize to an external SNTP source
(GPS clock, etc.).
The AC 800M time is broadcast to all controllers and the AC 800 Connectivity Servers.
To keep the domain controllers synchronized, they should be configured as SNTP slaves,
either to the Connectivity Servers (requires registry changes for the W32Time Service) or
straight from the AC 800M (requires TCP/IP forwarding to and from the control network).
If the CS option is selected, enable SNTP Time Server and disable SNTP Time Client in
the Connectivity Servers (keep Win32Time running in the DC:s and the Connectivity
Servers that shall be NTP servers). Disable Win32Time in all clients and other servers
(the third connectivity server).
More information and configuration examples are to be found in the Automation System
Network, Design and Configuration User’s Guide (3BSE034463Rxxxx)
In most cases, undesired AfwTime Service Providers should be removed from Aspect,
Connectivity and Application Servers. Normally, only one pair of AfwTime Service
Providers should be running in a system – and that on the “primary” connectivity that is
supposed to control time.

Unless specially required, it is also recommended to disable the Clients allowed to set
time setting on the AfwTime Service Special Configuration tab.
8.11 PNSM Basic Computer Monitoring

Unless other monitoring is deployed it is recommended to utilize the Basic Computer
Monitoring function of PC, Network and Software Monitoring.
An example of objects created by the PNSM - Basic Computer Monitoring Wizard

Note: The alarms issued by the Basic Computer Monitoring should be included for
presentation at the operator or system engineer workplaces/consoles.
8.12 Anti-virus software

If anti-virus software is installed, it shall comply with the recommendations given in the
Technical Description at ABB Library:
Using McAfee VirusScan Enterprise with System 800xA (3BSE048631)
Note: The anti-virus definitions must constantly be kept up to date, or else the level of
protection will gradually decay and eventually be unable to protect the system
again malware.

9 Backup strategy
9.1 Drive Images

Keeping drive image backups is recommended as it often make it possible to quickly
come back to a previous known good state.
However, there are a few things that need to be considered:
• Do not perform drive imaging of any computer with System 800xA running.
There is ample amount of support cases with root causes due to online backups.
E.g the backup software has consumed too much system resources, prevented
disk access or dissipated network bandwidth to the extent of system failure.
• Always perform a Configuration WizardMaintenanceMaintenance Stop before
any image backup is attempted.
• Some features like Oracle, SQL Server, etc is not shut down by a Maintenance
Stop. A disk image backup with Oracle onboard and running is likely to cause the
database to fail once the image(s) are restored. Shut down Oracle before imaging.
• Find a reasonable schedule for the drive images. There is often no need to pull
them on a timed schedule. Backup before and after software changes to be able to
fall back to a known good state. In between these changes, the need for backup
should be marginal or even ignorable.
• Most nodes in System 800xA does not store any system data. Such nodes are
less important to backup.
• The need for frequent backups lessen with redundant servers. Nodes storing
system data (typically aspect-, connectivity- and application servers) can often be
made redundant. Failure of such node can be recovered from by reloading a rather
old drive image (with outdated system data but with a compatible software state)
followed by a cold start which synchronizes the system data with the redundant
peer(s).
• The System Version 6 System Installer brings even more arguments for making
drive images since:
o Nodes can only be added/deployed once in their lifetime
o Replace Node becomes a very easy way out if a node can be reverted to
virgin state from an early taken drive image (taken before installing any
ABB software)
9.2 Microsoft Windows Domain

The Microsoft Windows Domain should be backed up to make it possible to recover from
a catastrophic domain controller problem (service account deleted, group policy disaster
rendering all computers unusable, unique hardware crashed beyond repair, theft, etc.).
Steps necessary
a) Install the Windows Server Backup Feature (Programs and FeaturesAdd Feature)
b) Create a System state backup (Backup Schedule Wizard)

9.3 System 800xA
9.3.1 Aspect Directory

Run the System Backup as often as necessary (which depend on the state the system is
in, engineering, production, etc.).
Check log for errors and warnings in past backups. Green = backup is healthy.
Backup objects indicating successful backups
9.3.2 External Services

The following services can be backed up using the built-in backup feature
• Basic History
• Central Backup (part of 800xA for Advant Master)
• Remote Access Client (part of Multisystem Integration)
9.3.3 Manual Exports

To add extra robustness, it is recommended to maintain exports of important objects and
configurations, e.g. Control Builder M applications, etc.
A manual export often allows easy import of a lost configuration whereas a system backup
often require the entire backup to be loaded into the system which can only take place
while the system is shut down for maintenance.
An alternative can be to restore to a temporary single engineering node from where
exports can be made to rebuild the production system without a complete shutdown.
9.4 Application Servers

The IM, Batch, Asset, etc. application server types may have additional backup
procedures that should be executed to increase available options for recovery in case of a
system problem.

10 Installation, environment, etc.

Cabling, mounting, vibration, grounding and shielding
Bus termination, cable length, signal attenuation
Humidity, temperature, dust, corrosive gases, etc.
Power supply (UPS?)
EMC

11 AC 800 Connect
11.1 AC800 OPC Server
11.1.1 Setup Wizard

Start  All Programs  ABB Industrial IT 800xA  Control and IO  OPC Server for AC
800M 5.0  Setup Wizard
Memory - Heap setting (In systems prior to version 5.1)
Heap Size setting must be set to an appropriate value (a redundant pair should have
about ~10% offset in between to avoid simultaneous memory full (& shut down).
Service Account
The AC 800 OPC Server service shall run under the 800xA Service Account.
11.1.2 Settings in OPC Server panel

Connected controllers
Verify that all listed controllers are properly connected. Having future (to be added)
controllers configured is not recommended. The OPC server will then (in vain) waste MMS
resources on trying to connect them with potential slowdowns as a result.
Autoload Configuration
Check that Autoload configuration is enabled and that a configuration is selected.
Update Rate
Select the controller/IP address in tab Data Access and examine the update rates for each
controller. Does the Actual and Requested Rate differ? Fluctuation in Actual Rate may
indicate communication overflow and/or resource shortage and should be investigated
further. Make hardcopies for all controller update rate values.
11.1.3 Tools in OPC panel

Display Variable Communication Statistics
Variable Communication Statistics, VCS tool can also be used to break down the MMS
transactions into applications, number of variables and used update rates. An offending
OPC client will leave traces in the VCS, e.g. excessive amounts of read or writes.
Display OPC Statistics
Collect OPC Statistics for Data Access and Alarm and Event. Compare redundant servers
to verify the load sharing. The number of subscribed items should not deviate too much
between two redundant OPC server instances. Affinity can be used to move clients back
and forth in case the balance is found too skewed.
Save cold retain values
It’s desired to configure the AC 800 OPC Server to automatically save Cold Retain values
on a cyclic basis. Recommended interval is 720 minutes. When using redundant servers,
define an offset of 50% of the interval (360 minutes) in one of the OPC server nodes to
alternate the saving over time.
11.1.4 Log files

C:\ABB Industrial IT Data\Control IT Data\OPC server for AC 800M\LogFiles\
• Session.log and Session.log_bakX

The rows are starting with I (Information), W (Warning) or E (Error) and date and
time for the event. Check for rows starting with E, investigate if an action for the
error is needed.

• OPC Server Date and Time Session.LOG (and .dmp)

These files are generated if the OPC Server would crash. If those files are in the
folder, check the date and time for the file. If it is in the near time and someone
know the circumstance around the crash, the files and a description of the problem
can be reported to Supportline for the region.
Note! Report always a crash when it happens and you know the circumstance.
11.2 Control Builder M
11.2.1 Project settings (Right click on Project icon  Settings)

• Difference Report
It is recommended to have the Difference Report enabled.
Note! After enable the Difference Report all applications and HW configuration
must be downloaded once with changes to make it possible to display the
differences next download.
• Compiler switches
It is recommended to have the following compiler switches set to at least warning:
“Loops in Control Modules”, “Multiple calls to the same Function Block” and “None
or multiple calls to ExecuteControlModules”.
Note!
Code sorting loops may cause undesired effects in the control applications. When
loops exist, a simple change in the application may cause code blocks to be
reordered during compilation and then execute in another order than previously.
Loops shall never exist in a controller in production state.
• Compilation warnings
It is recommended to show warnings for “Changes application” and present
warnings for “Compilation”, “Change analysis” , “Task Collisions” and “Others”.
11.2.2 Tools
• Task analysis tool (Version 5.1 )
It is recommended to have “EnableTaskAnalysisTool” set to true. (Tools  Setup
 Station Application Download). This tool makes the task tuning easier.
• Clock Synchronization SNTP/CNCP
Verify clock synchronization (Tools  Maintenance  Clock Synchronization
Status.)

A Time Quality value of 5 (TQ5 = deviation less than 10ms) or higher is

recommended.
11.2.3 Controller hardware object (Hardware AC 800M)

• The parameter “Copy unconnected channels” shall be set to “None”.
(Right click on the Hardware AC 800M level - > Editor.) This parameter is only
intended to be used temporary during commissioning test phase.
11.2.4 Hardware configuration editor (PM Type)

• Parameter ”AE System event to controller log” default value is ”Medium Severity”.
Otherwise inform customer in the record.
11.2.5 Setup Wizard - Heap setting (In systems prior to version 5.1)
Use Help  About… window to examine the Memory Free value. A too low value may
cause the Control Builder to crash. Ensure that the Heap Size is large enough to fit the
application. It’s recommended to keep 30-50 % of the memory free after some downloads
since application download allocates additional memory. A too large Heap Size (800-
900MB) occupies unnecessary RAM and closes in on the 2 GB per process maximum
limit of Windows. Change heap size at Start  All Programs  ABB Industrial IT 800xA
 Engineering  Utilities  Setup Wizard.
Ask the customer/application engineer if they have problem with the Control Builder,
suggest the customer to adjust the settings. Note this setting is for each Control Builder, if
using terminal server it only needs to be set once.
11.2.6 Status for the controller, CEX-modules and IO-modules.

Go online with the Control Builder check the status for the HW, the same information is
available in Plant Explorer Workplace  Control Structure  Root  Control Network 
Project Name  Controllers  Aspect “System Status Viewer”. Report if not good status.
Check if configuration for example PM8XX, SM81X, CI8XX and IO-modules that are
redundant in reality also are configured as redundant in Control Builder and vice versa.
11.2.7 Log files

C:\ABB Industrial IT Data\Engineer IT Data\Control Builder M
Professional\LogFiles\
• Startlog.txt
The file contains all logging from Offline to Online mode (Test mode or Download
Project Going Online). Check for error for the latest downloads, is there any errors
that the application engineer needs to take care of? Document in the record.
• Session.log and Session.log_bakX
Check for errors.
• ControlBuilderPro Date and Time Session.LOG (and .dmp)
These files are generated when the Control Builder crashes. If those files are in
the folder, check the date and time for the file. If they are recent and someone
knows the circumstances around the crash, the files and a description of the
problem can be reported to the regional Support Center.

Note! Report always a crash when it happens and you know the circumstances.
11.3 AC 800M Controller
11.3.1 Remote System dialog

In Control Builder  Tools  Maintenance  Remote System  IP Address  Update
In the above dialog a number of controller diagnostics is made available. Most of them are
described in the sections below.
11.3.2 Firmware Information

Press on the “Show firmware” button, the firmware information will now be saved in the
Control Builder Session log file. In the editor check if the used firmware is the same as
suggested new firmware. If not same investigate if that is correct, the controller may run in
coexistence (Newer Control Builder version than Controller version) or maybe a
Temporary Correction (TC) is used in the controller. Allowed Supported Versions for
Coexistence of Controller Versions will be found in the System 800xA Release Notes for
each version.
For AC 800M HI the firmware and hardware version must comply with TÜV certificate
report 3BSE054957 (SV4.0, SV4.1 and SV5.0 SP1) and 3BSE054960 (SV 5.0 SP2 and
SV5.1). In the 3BSE054960 document also allowed coexistence combination is displayed.
Be observant that NON-CERT firmware is not allowed to be used in High Integrity (HI)
controllers (PM865, PM867, etc.) running in production!
Check the firmware in all CEX modules, note that also here a TC can be used.
The firmware version is also found in the CB session log file, see section Log files in
section Control Builder M.

11.3.3 Controller log files

DCT has a wizard for analyzing controller logs (NotePad++ can also be used.)
Fetch new created log files by pressing the button “ Show Controller log”. The log files are
saved in C:\ABB Industrial IT Data\Engineer IT Data\Control Builder M
Professional\LogFiles. Three files will be created as from version 5.0. They are named
Controller_IP Address.log, BackupCPU_IP Address.log and CI_IP address.log (CEX
Interface module). Note that the CI log file is not backed up by any battery backup. Loss of
power will erase the CI log.
The Controller log file contains two parts:
The first part in the Controller log file is always from initial startup and sometime
information that happened just before and will not be changed until next initial startup or
Online Upgrade. That explains why the time stamps can be from years ago.
When redundant controllers and redundant Ethernet are used, verify that the Backup
Controller IP Addresses are enabled and correctly set.
Example of a startup for a redundant controller:
Example for a startup after an Online Upgrade has been performed:

The firmware information is also in the controller log file, how to verify that read the
Firmware session in this document.
The second part is part is "alive" and starts with “===Log fetched at date and time ===”.
New information is added in the bottom (that pushes older information away, but not the
information in the startup part).
Most of the information in the second part of the controller log file can also be found in the
[Workplace Structure]Web System Workplace:System Event List. If the time slot in the
controller log file is shorter than a month check the System Event List.
The rows are starting with the letters I (Information), W (Warning) or E (Error) and date
and time for the event. Check for rows starting with E, investigate if an action for the error
is needed. Don’t forget to check all of the log files (primary, backup, CI-log).
Consider reporting errors that cannot be explained by the customer, e.g. a power failure.
Example of warnings that should be reported in the health check report even though they
have an old date:
W 2012-09-17 16:49:30.261 Unit= _SWFirmware ContrName 1011 AlmDefErr ObjectName
The alarm condition is not created successfully, the alarm will not work. Solution:
Search for the alarm object, it may be a parameter that is wrongly configured in
the 1131 application code (typically, the alarm’s SourceName + Condition name
have not been uniquely defined in the application)
W 2012-05-09 18:04:06.033 Unit= _SWFirmware ContrName 1012 Undeclared external event 0.11.206.15 true
The IO channel is configured for SOE, but the signal is not connected to an alarm
object. Solution: Create an alarm block or configure the channel as DI (not
DI+SOE)
W 2012-07-06 16:18:06.345 Unit= _SWFirmware ContrName 1030 AE setting NamValItem LogStrings to low
Solution: The CPU settings must be modified concerning: AE Max no of Name

Value Items and AE Max no of log strings set the number of NameValueItems and
NameValueItemStrings to be allocated.
W 2012-01-26 19:44:25.115 The Idle thread has executed less than 1%.
If this is seen during download it can be ok, but if seen without download the
controller has too high load. Solution: Recommend task tuning.
W 2012-03-14 14:45:52.504 Unit= _SW1131Task TaskName 2001 Interval time in ordinary tasks inc 15.
W 2012-03-14 14:46:02.562 Unit= _SW1131Task TaskName 2002 Interval time in ordinary tasks dec 13.
This controller has Load balancing enabled and the controller will adjust the task
interval time to keep the cyclic load to 70%. Solution: Recommend to adjust the
task interval time so that the cyclic load will be lower than 70%.
More system events and alarms are found in 3BSE035980*; AC 800M Configuration;
“Appendix B System Alarms and Events”.
Controller crash:
A controller crash is recognized as in the example below:
Starting with System Version 5.0, controller crash logs are automatically transferred to the
MMS Server working folder at all computers running an MMS Server. Those files are not
translated but it can be done by Supportline. Network switches must be configured so that

multicasting is enabled; otherwise will the log distribution to MMS server not work. See
Product Bulletin System 800xA - RNRP Network Configuration Requirements
3BSE066739.
The file will be saved in folder C:\ABB Industrial IT Data\Control IT Data\MMS Server for
AC 800M. It is preferred to collect the files from the connectivity server.
If crash logs are found, check with customer if it is desired to have the crash investigated.
If so submit a support case to the regional Support Center, do not forget to add the
circumstances from around when the crash occurred. Note: inform the customer to always
report a controller crash directly when it happens and they know the circumstances.
11.3.4 MMS Connections

There is a strong relationship between the number of MMS telegrams per second and the
controller total system load. The cyclic load is generated by the 1131 application code.
This means if the cyclic load is low there is more capacity for communication like MMS.
Therefore it is not possible to recommend a max number of MMS telegrams per second,
but the total system load for a PA controller should not exceed 95% and for a HI controller
90% is maximum allowed value. Furthermore it can be problem for the HI controller during
download if there are more than 85-90 MMS/ IAC telegrams per seconds.
The total number of MMS and IAC connections should be kept below 25 for any controller.
11.3.5 Controller Analysis

Note: this function is available in CBM’s Remote System dialog in 800xA 5.0 SP2 and
later versions.
Note: it is recommended to fetch all controller logs before starting with the Controller
Analysis functions since they output their results to the controller’s RAM based log which
is very limited in size (or else, other important information in the log may go lost).
Some of the analysis will generate dedicated files that are saved in C:\ABB Industrial IT
Data\Engineer IT Data\Control Builder M Professional\LogFiles.
11.3.5.1 Module Bus Fail Counters

Reset the Module Bus Fail Counters in the beginning of the health check period. Before
completing the health check on site, return and check the counters for each controller
making use of the Module Bus.

If any counters have increased, those modules should be put under observation.
It is possible to get the Modulebus I/O Revision from the Controller Analysis dialog, the
preferred version can be found in the System 800xA Release Notes for each version and
for HI IO in TÜV certificate report 3BSE054957 (SV4.0, SV4.1 and SV5.0 SP1) and
3BSE054960 (SV 5.0 SP2 and SV5.1).
11.3.6 Diagnostic for Communication Variables (IAC)
For implementations with Inter Application Communication (IAC) from the Control
Builder’s Tools Maintenance Remote System dialog, press the button Show
Diagnostic for Communication Variables.
Start with performing a reset, then wait some minutes and check the result.
There should be no unresolved variables. If there are unresolved use Show Unresolved
Variables to find out the variable names and document in the record.

The following counters shall be 0:

• Internal type errors - Type mismatch between applications within the selected
controller.
• External type errors - Type mismatch with an application in another controller.
• Uncertains/Warnings - Retransmission have occurred at the IAC Variable.
• Timeouts - Variables that are not updated within the requested timeout interval.
Number of Transactions/s shall have the same value as Expected Transactions/s.
If “Number of Transactions/s” is less than “Expected Transactions/s” the controller is
overloaded and cannot communicate within the configured IAC Interval time.
11.3.7 Tasks
It is recommended to have time gap between task executions to avoid that other lower
prioritized functions in the system such as communication (for example MMS) will be
disturbed/starved.
The task overview (with Control Builder in online mode, right click on tasks  Editor) is
good to use to analyze if the controller is task tuned.

There are many parameters to explain how to tune the tasks, but some check can be
made:
Priority
Different priorities should be used. A task with higher priority can interrupt a task with
lower priority. In the example above it can be latency for one of the tasks 2-4 during
download, because of same priority and long First Scan Execution Time.
Task Interval Time
Compare Interval time with Actual Interval Time, they should be nearly the same.
Hint! To ease performance tuning, it is recommended to configure the interval time as a
multiple of the task with the shortest interval , e.g.
• 50, 100, 200, 400, 800, 1600, … ms.
• 125, 250, 500, 1000, … ms
Actual Execution time
Recommended maximum Actual Execution time is for a PA controller 200ms and for a HI
controller 100 ms. If the execution time is longer than recommended is the solution to
create more tasks to the application or split it into two applications.
Offset
Shall be > 0. Compare Offset with Actual offset. For example MMS task above should not
have offset set to 100ms. 120ms had been a better value to avoid latency when VMT is
executed. If offset is not configured you will get a message during compilation that there
are Colliding Start Times for the tasks. Use an Offset value of at least 5 or 10ms (better)
or even more.
Accepted latency (HI controllers)
The default latency limit is 10%, if this is used ask if the process really require that low
latency. Low latency limit can result in a shut down during download, especially if the
controller not is task tuned. More details found in AC 800M HI Controller Firmware 4.1 and
5.0, Configuration considerations AC 800M High Integrity controller - 3BSE047421D0025.
First Scan Execution Time
If this time is high it can cause that other task will be late during download. See Accepted
latency.
More information about task tuning can be found in AC 800M Configuration -
3BSE035980* and AC 800M Planning 3BSE043732*.
As from SV5.1 Task Analysis tool is available to make task tuning easier. The tool assists
with a graphical representation.
There will be a warning if the time gap is <= 5ms or < 20% of previous execution time and
error if the time gap is < 10% of previous execution time.
11.3.8 SystemDiagnostics Function Block

Call the Control Builder M  SystemDiagnostics Interaction window for each controller
(from SV 4 and forward the same information is available as faceplates in Plant Explorer).

In all newly created controller projects, the SystemDiagnostics is included in the

Program3.Slow application template application. It is recommended to have at least one
instance of the SystemDiagnostics function block in every controller.
Verify the following items:
• CPU Load, see own chapter
• Memory, see own chapter
• Alarm and event information
Printer queues: There shall be no queues!
Subscribing systems: Shall be on the primary network (for example 172.16.80.152)
• Ethernet statistics:
– number of data packages sent
– number of sent data packages that were lost
– number of data packages received
– number of received data packages that were lost
Calculate how many packages are lost in %.
11.3.9 CPU Load

The CPU load can be found in one of the task properties in Control Builder in online mode
or in SystemDiagnostic Function Block interaction window.
• Verify load figures for PA controller
Cyclic load < 70%
Total load < 95%
• Verify load figures for HI controller
Cyclic load < 50%
Total load < 90%
If high load discuss how to reduce the load by for example increase module bus scan time
or/and increase task interval time.
11.3.10 Memory Consumption

Verify the memory consumption. Available free memory in the controller must be at least
the size of the largest application. The value Maximum used memory at stop serves as a
good indicator – if this value is less than 100% warm download should be possible.
Max used memory at stop

Control Builder M  Tools Compiler Statistics can be used to get an approximate size
of the actual application.
See also System Guide Technical Data and Configuration, 3BSE041434 - Spare Memory
Needed for online Changes.

11.3.11 Modulebus scan time

In Control Builder M  Project  Controllers  Controller Name  PMType 
ModuleBus  Right click Editor.
The module bus scan time shall not be shorter than necessary. The module bus scan time
has influence on the system load. A low value will cause higher load than high value. The
recommended value of the module bus scan time is half the interval time of the fastest
1131 task cycle time for tasks using the module bus IO.
11.4 CEX modules

Common for CEX-module:
The CEX module can be reset by IOC framework three times and it will restart
automatically every time (except SM81X when using SIL 3 marked applications.) The
fourth time the CEX module must be removed and inserted manually. If this is found (see
example below) in the log file, inform that they need to reset the CEX module manually
next time.
E 2011-02-13 09:52:26.786 Backup CEM at pos 7 is reset by IOC Framework[ExtStat: 0x50000204] (3 of 4)
Next time it will be reset it will look like:

E 2011-02-13 11:18:33.319 Backup CEM at pos 7 is reset by IOC Framework for the last time.
Physical HotSwap will reset ResetCount.
[ExtStat: 0x50000204] (4 of 4)
The reason for the reset from IOC frameworks needs to be investigated.
11.4.1 CI854 Profibus

Use Control Builder M in Online mode to check the unit status on CI854 and configured
slaves. No unexpected error messages should be shown.
One typical error message is:
The above is normally caused by something that is wrong with the Profibus
communication quality, e.g. bad HW installation.
Webserver  Service file
How to use the CI854 WebServer see the manual: AC 800M PROFIBUS DP
Configuration 3BDS009030*.
• Enter the IP address of the controller in the address bar of the browser, default
user is “service” and default password is “ac800m”.
• Create the service file select “Create CI854 service file“ in the upper left corner of
the browser window. A dialog opens to select the CI854 you would like to collect
the data for. To have all information in place always select all modules.
• Cross-check the Profibus Slave configuration in CBM hardware configuration with
the live list in the service file.
With the CI854 web server it is possible to detect Profibus communication retransmitting
on a low level, at an earlier stage then reported at Unit status in AC 800M.

Here is a typical example how it could look like when communication retransmitting on a
low level occurs at Profibus Master pos. 2. The slave address is marked with bold,
extracted info from the file: (CI854 web server service file):
-- CEX Slot 2 [CI854] --
--DMJ Buffer--
Current entry: 56 of 62
Fcode Code Info1 Info2 Date Time
4451 0002 00010007 47444353 .... SCDG Tue Oct 25 12:20:53 2011
4451 0002 0001000D 47444353 .... SCDG Tue Oct 25 12:20:53 2011
4451 0002 0001000E 47444353 .... SCDG Tue Oct 25 12:20:53 2011
5608 002F 0000000E 43454E53 .... SNEC Tue Oct 25 12:20:53 2011
5608 002E 00000002 47454E53 .... SNEG Tue Oct 25 12:20:53 2011
5608 002E 00000007 47454E53 .... SNEG Tue Oct 25 12:20:53 2011
If the DMJ Buffer is continuously reporting new events in a normal running Profibus line,
it is an indication of communication problems. It is suggested to document in the test
record that you suspect Profibus communication/installation problems.
11.4.2 CI867 TCP IP

Check in the controller log and the CI log files for error and warnings. Some examples are
described below.
Controller log:
Startup of CI867:
It is normal that the CI867 module is not ready at the start up and it will be shown
as unknown and after some seconds it will be configured.
Therefor following messages during startup is normal:
W 2012-01-26 19:53:57.029 PhResponsibility:: Create and Open Driver failed for device at CEX position 1
W 2012-01-26 19:53:57.031 PhResponsibility:: Primary (Single) CI867 did not become ready in within time. Or
there is a problem with the driver
I 2012-01-26 19:53:57.047 PhMODBUSTCP: DeleteSlavethread::Slave Thread Not Running
Out of range for Dint:

E 2011-03-01 10:33:57.732 PhMODBUSTCP: WriteVarReq: Writing data more than 32767 and less than -32768
is not supported
ErrorCode = -7005
I 2011-03-01 10:33:59.131 PhMODBUSTCP: SendErrorCode != NOERROR i.e. ERROR
If the Dint variable contains values out of range -32768 to 32767 this error code is
displayed and the write operation is not effectuated. Document the problem in the
health check test record as an application error.
CI log:
Buffer full:
E 2012-02-16 17:53:32.623 PHMODBUSTCPTelegram.cpp, Line:377, ProcessResponseMsg: Pending queue
full response recieved from CI867. ErrorCode = 100
The above indicates that the buffer is full.

Solution: Reduce number of telegrams. Check number of slaves. Max 70 slaves
which limit the number of telegrams to 16 per slave. If one slave is used the max
buffer size is 60 telegrams and max total number of telegrams for one CI867 is
1120.
Data not available:
E 2012-04-16 05:14:26.055 (tMbmTcpRcv)[SYS] CModbus::ReadNewData()-3- Socket 4 Still blocks after
retrying to recv Data. Error = 70
This error shows when trying to read data, but the data is not available.

Network loop:
I 2012-04-10 12:29:38.182 (tNspTask)[SYS] NSP: tNetTask running without delay
I 2012-04-10 12:29:38.827 (tNetTask)[SYS] NSP: High Ethernet load generated from addr 0:7:32:19:64:2b
I 2012-04-10 12:29:39.183 (tNspTask)[SYS] NSP: Network Storm Protection is delaying tNetTask at limit
1000 packets/s
This error shows that there is a network loop.

Solution: Check switches and modbus network configuration.
Disconnect Master:
I 2012-01-02 10:00:54.645 (tMbsTcpRcv)[SYS] CCnctdMaster::ConFail()- Error,connection to master
Failed, Connected Master is Tagged for removal and will be Removed from List
This occurs if a Master connected to the CI867 which act as slave has
disconnected the connection before CI867 has finished the transaction.
Duplicate IP address:
0x1f10940 (tNetTask): duplicate IP address 64646401 sent from ethernet address 00:00:54:14:f3:6f
There are more than one slave with the same IP address connected on Modbus
TCP network.
11.4.3 CI868 IEC 61850

Please check the ValuesReceivedPerSeconds (max 150/s), ValuesSentPerSecond (max
60/s) and the CPULoad.
Note: there must be variables connected to the channels otherwise there will be no
values.
If you suspect any problem you can fetch the controller log files and check the information
for CI868 in the CI log.

12 800xA for Advant Master (AC 400 Connect)

To save time, it is recommended that all of the following test steps are performed in two
steps (in parallel per node) on the MB300 network.
At the first step, activate all of the ANPER analysis and check the clock synchronization at
the first node, then continue with the next until all nodes have been visited.
When enough time (30-60 minutes) have passed (for the ANPER to collect enough
statistics) proceed with the next step where the ANPER analysis is terminated and
collected to Notepad, Excel, etc.
The SYSCHA channel will show non-zero SEND EVENTS if any system messages were
generated during the analysis. If these messages were not caught by some OnLine
Builder it may be an idea to connect to that node again and wait for a possible repeat of
the lost system message.
12.1 System messages at RTA boards

Use OnLine Builder / RTA Board Maintenance to probe all RTA boards for System
Messages.
Expected: No unexplainable messages
12.2 System messages in Advant/Master controllers

Use OnLine Builder / RTA Board Maintenance to probe all controllers for System
Messages.
Expected: No unexplainable messages
12.3 System and channel load at RTA boards

Run ANPER System Load, Channel Load and Task Load analysis.
Expected: Average System load is less than 50%
No longer (>1-2s) peaks at 100% System Load
No full events.
LCUSED does not list any 100% filled channels (except CXCCHx).
Pay special attention if 3:rd party OPC clients are using the RTA board to read data from
the controllers. It’s recommended to use 1, 3 and 9 second cyclic subscriptions only. More
information is available in the following document:
800xA for Advant Master Performance Guideline (3BSE042621Rxxxx).
Subscriptions using a longer cyclic update rate than 9 seconds will show up as activity on
the DCSCN2 channel, whereas regular subscriptions using 9 seconds or faster cyclic
update rates are handled by the DCSCN3 channel.
12.4 System and channel load in Advant/Master controllers

Run ANPER System Load, Channel Load and Task Load analysis.
Expected: AC450 average System load is less than 80%
MP200/1 average System load is less than 70%
No full events.
LCUSED does not list any 100% filled channels (except CXCCHx).

Starting with System Version 5, the RTA CPU Load can be monitored and logged within
the 800xA System itself by adding & configuring* an RTA Load objects below each RTA
board/PU410 unit in the Control Structure
*) The Control Connection aspect’s MB300 tab must be updated with the RTA’s net and
node numbers (default values are 0).
12.5 RTA Board communication statistics

Use the RTA Board Maintenance tool to list communication statistics by calling the TSTM
command. Then task 15) List statistics and then select function 1) Summary.
* TSTM
* SELECT TASK
...
15) List statistics
...
Select function ( 1 - 17 ) ? 15
...
SELECT FUNCTION
...
1) Summary
...
Select function ( 1 - 17 ) ? 1
...
==================================================
Summary of signal statistics
==================================================
Signals sent to controllers:

----------------------------
Signals lost = 1607 0.1%
Signals sent with success = 1313732 99.9%
Total no signals sent = 1315339
Signals sent to Windows:

-------------------------
Signals lost = 962 0.0%

Signals sent with success = 7735680 100.0%
Total no signals sent = 7736642
Verify that the two Signals sent with success values are or is close to 100%.
Note:
The following RTA Board (PU410) system messages relate to the above counters:
DCXA1280 24 5 H’xxxxxxxx H’xxxxNENO
= Unable to send sub. request to controller NENO (receive channel is full)
NE = network (in hexadecimal). NO = node (in hexadecimal)
DCXA138X 24 15 H’xxxxxxxx 6008

= Unable to send to Windows (dual port memory in RTA device driver is full)
X=2 (>9sec. responses). =3 (1,3 or 9sec. responses).
12.6 MB300 OPC Server (MasterAdapter) health

Expand the OPC DA Connector health check (chapter 8.4.1) by also running the Statistics
operation on the AdvDsMasterAdapter component. Review the results.
Items of special interest (note: in general, a zero or low value is better than a high value):
…
Accumulated number of missing subscriptions1 = 13
…
Execution timers2 = 0
…
Read transaction timers3 = 0
Write transaction timers3 = 0
…
Process objects with missing subscriptions4 = 0
…
Process objects with dummy subscriptions5 = 2
…
1. The accumulated number of times the OPC server has been forced to attempt to
restart a subscription. The value is the integrated value of item 4. below.
Subscription restarts are triggered by loss of expected input, which may be due to:
• Too many subscriptions
• Controller overload
• RTA Board (PU410) overload
• CPU overload in 800xA for Advant Master Connectivity Server (Windows)
2. The number of OPC items for which the OPC server must emulate a cyclic
subscription for by sending repeated and perpetual “read once” requests.
A rule of thumb is to keep this value below 100 as this emulation creates significant
overhead along the whole communication chain and should thus be avoided.
Items capable of cyclic subscription are listed in Appendix E of the 800xA for Advant
Master Configuration User’s Guide, 3BSE030340.

Example: The MB300 AI faceplate, extended view, AI. Limits tab creates one (1)
Execution timer by subscribing for the ALARM_DELAY_COUNTER property.
In fact: all such properties (lacking cyclic support) have an Update Rate of 20.000
milliseconds in the Control Connection aspect.
3. Read / write transaction timers indicate non-subscribed read (SyncRead,

AsynchRead, Refresh, etc) and write (SyncWrite, AsyncWrite, etc.) OPC operations.
Due to the MB300 network’s design, such calls should be used very restrictively since
they create significant overhead in the MB300 signalling/traffic.
The only(?) exception is SyncRead from cache (=not device read) when also having a
proper 1, 3 or 9 second cyclic subscription (i.e. an active OPC group and active OPC
items). The OPC server will then respond with the most recent cached value instead of
making full roundtrip with the controller.
4. The current number of subscribed items where responses are lacking.

An integrated value can be read from item 1. above.
5. Dummy subscriptions are created when a process object only have subscriptions on
items not part of the list of cyclically subscribed properties.
E.g. when only subscribing to the ALARM_DELAY_COUNTER property of an MB300
AI object.
12.7 Clock synchronization

Use the following commands to check the clock synchronization task in the local node.
# SLLEV SYST (if not already at SYST level, = “*” prompt)
* LOCPSET CMDS:TP02.CT
* LCLKP
All messages except “Dormant” is acceptable.
If AC 800M with CI855 communication interfaces are used to keep the MB300 time, all
other nodes (primarily RTA boards) should have their CLOCK_SYNC.CLOCK_SEND = 0 to
prevent the CI855 from being disturbed.
To support local time and daylight savings time, all computers must be configured with the
correct and equal time zone and automatic daylight savings time adjustment in Microsoft
Windows:

13 PLC Connect
PLC Connect does not have any hard-coded limitations; the performance depends on the
computer performance.
13.1 Collect statistics with AppLog

A number of key figures are to be collected with the ABB Application Log Viewer
(afwapplogviewer.exe).
Start AppLog using StartRun… afwapplogviewer +<Enter>
Press the OK button in the following dialog boxes.
In the list of nodes, select the PLC Connect Server node(s) then select the AdsScadaSrv
application. If the desired node(s) does not allow to be selected, try restarting the ABB
Application Log Service there and try again. Click the Operations button
As described in the next step; perform the following operations and record the results in
the Test Record.
Select each operation marked in the above picture and press the Invoke button to
execute it.
13.1.1 Communication Server – GetUpdateStatistics

This operation displays key values from ongoing OPC DA subscriptions towards PLC
Connect. Note: with no active subscriptions, zero values will be presented.
Example of results:
PLC update frequency (items/second) -
Last 10 seconds = 26.0386
Last minute = 26.7358
Last 10 minutes = 26.5886
Last 30 minutes = 26.5988
Expected: The update frequency should not exceed 3000 items per second. Update
frequencies above 3000 may result in unpredictable system behaviour.

13.1.2 Communication Server – ItemInfo

Total update count = 736104
Total external write count = 0
Total internal write count = 714531
CONTROLLER Internal ReadyLevel:0 Failover:false Running:false

Active:false Comm Status:OK Protocol:PlcInternalDriver.dll hProtocol:0
OBJECT AIC1 ObjInfo:
Type Name Timestamp Quality Update Write Value
-------------------------------------------------------------------------
FLOAT Value 12-11-08 14:44:31.767 c0 712702 712702 -84
...
(a list with all OPC items, last change time, quality, value, etc)
The list can easily be imported to e.g. Excel to be able to filter for/find items with problems
(bad quality, not updating, etc.)
13.1.3 Communication Server – DriverInfo

This operation displays key values from serial and Modbus TCP/IP communication.
Note: without serial or Modbus TCP/IP communication no information will be presented.
Example of results and comments:
All Drivers
PlcModbusTCPDriver.dll Modbus TCP/IP Driver Ver 5.0.1-1
PlcModbusDriver.dll Modbus Driver Ver 5.0.1-0
PlcOpc.dll PlcOpcClient version 5.0.1-0
PlcInternalDriver.dll Internal Driver Ver 5.0.1-0
All Initialization Strings

PlcModbusTCPDriver.dll Modbus TCP/IP Driver Ver 5.0.1-1
InitString =
172.16.4.54$1$ModbusTCP$502$56$2000$2000$125$125$HILO$30000
CommStatistics = Modbus is running and substation is active.
Messages= 7329 Retransmissions = 5
Comment: Communication is working as expected
PlcModbusDriver.dll Modbus Driver Ver 5.0.1-0
InitString =
COM1:$3$Modbus$600$8$1$None$30000$None$56$200$200$125$125$
LOHI$0$5$0$30000$0$0$0$$
CommStatistics = Modbus is NOT running and substation is active.
Messages= 93262 Retransmissions = 93261 .
Reason for interrupt: Message=03 03 00 69 00 02 15 F5
Comment: PLCC tries to establish communication, but fails due to either hardware
problem or configuration errors

PlcInternalDriver.dll Internal Driver Ver 5.0.1-0
InitString = Internal
CommStatistics =
Comment: Internal driver works as expected
PlcOpc.dll PlcOpcClient version 5.0.1-0
InitString = Matrikon.OPC.Simulation.1$localhost$$$
Matrikon OPC Server Simulator
CommStatistics = Server Status: OPC_STATUS_RUNNING ,
ItemCount on OPC-SERVER :76
Comment: Driver to external Matrikon OPC server, status as expected , lists also the
number of available items.
Expected: The frequency of retransmissions should not be too high.
Expect less than 0.1% when using a short serial cable. When using radio
modem retransmissions up to 20% may be normal.
13.1.4 Communication Server – RunningMode

This operation displays key values from the PLC Connect Real Time DataBase (RTDB)
Example of results:
State:
Controllers: 9
Types: 0
Objects: 114
ObjectItems: 569
Running: Yes
Running mode: Master
Prefered Master: Yes
Slave nodes: 1
Server State: Running
Op. State: Running
Expected: The sum of Controllers, Types, Objects and ObjectItems should not
exceed 25000 (which is a license limit) and 10000 in a combined
Aspect/Connectivity server.
13.1.5 Select Event Server – Alarmlist

This operation displays key values from the PLC Connect OPC AE Server.
Example of results:
09-07-16 16:29:26.0 Error{14D6C144-974A-4CAF-9FA8-E32CB232A6D5}:Error

State=5 EventType=1 Severity=995 TextGroup=0 Class=1 No of transitions=1
09-07-16 16:29:26.0 Alarm1{1D5DEAB3-1D85-4BBA-A239-C6C11099A43D}:Alarm1
09-07-28 07:37:52.0 Error{3AB63B99-47D5-45F2-878D-6C6AB91E512F}:Error
09-07-28 07:31:01.0 IntSignal{5394A221-C942-4AB2-809E-
591F538E9C24}:IntSignal:LL State=4 EventType=1 Severity=1000 TextGroup=0
Class=1 No of transitions=1
Estimate the volume and frequency of alarms emitted by the OPC AE Server.
The practical limit is about 50-60 alarms/second.
Expected: In continuous operation the emission of alarms should be less than 5
alarms/second.

13.2 Check logfiles

Browse to C:\OperateITData\AdsServer\Service Group Data {GUID}\Logfiles
Examine the contents of the following log files (e.g. using notepad.exe) – the actual files
available depends on the setup (type of protocols, etc.):
[ComputerName] AdsPlcOpcDriver.LOG
[ComputerName] ClientSupport.log
[ComputerName] CommServer.log
[ComputerName] DeployManager.log
[ComputerName] EventServer.log
[ComputerName] OpcClientOPC.SimaticNET.1.txt
[ComputerName] PlcOpcClientMessages.log
[ComputerName] PlcSattbusDriver.log
[ComputerName] Rtdb.log
[ComputerName] SattBusMessages.log
[ComputerName] ScadaServer.log
Expected: No errors without a plausible explanation.

Contact an ABB Support Center if assistance with identifying log messages is required
13.3 Measure time needed for “Full Deploy”

Perform a Full Deploy on the PLC Connect server. Press & hold <Shift> to enable deploy
even if no changes are pending.
Open [ComputerName] DeployManager.log and calculate the time the Full Deploy took.
Expected: less than 30 minutes.
13.4 CPU load and memory used by PLC Connect processes

Use Windows Task Manager to display CPU Load and Memory Usage for the following
processes in each PLC Connect Server.
CS1 CPU Load Memory Usage
AdsAbsDeployMgr.exe
AdsAeSrv.exe
AdsClientSupportSrv.exe
AdsCsCommSrv.exe
AdsScadaSrv.exe
CS2 CPU Load Memory Usage

AdsAbsDeployMgr.exe
AdsAeSrv.exe
AdsClientSupportSrv.exe
AdsCsCommSrv.exe
AdsScadaSrv.exe
Expected: No “odd” values (e.g. excessive CPU or memory usage). After an upgrade,
adding a new controller, etc. the values should only change in reasonable
way.
The AdsCsCommServer.exe process on the slave does typically have
higher CPU usage than the master.
Hint: Keep a record of the values to be able to draw better conclusions.

14 Information Manager, IM
Information Manager is mostly used to host secondary and hierarchical logs. It can also
archive log data to secondary media and act as an information gateway to office
applications and report packages.
14.1 System Messages from IM

The IM reports system messages to the 800xA framework. Check the System Event List
or create a dedicated Alarm and Event List Configuration (filter object) and Event List
aspect to view IM History events. IM diagnostic events are emitted on the IM History
event class.
14.2 Oracle database instance health check

Use the IM Oracle Database Instance Wizard to check that no tablespace has run full
(100%) or is near running full.
A common problem is that Event Logs continue to grow and require additional space in
the HS_INDEXES and INFORM_HS_RUNTIME tablespaces until the event log has
wrapped a few times. Depending on the configuration this may take several weeks to
happen.
Tablespaces with the Auto Extend feature enabled may grow automatically on disk (and
it’s possible to put a limit on this function to prevent Oracle from completely filling the hard
drive). To be able to reach the theoretical maximum storage capacity of 12 million OPC
events the Oracle database files must be able to grow to 32 GB.
Check that the Oracle Alert file does not show any alarms. Normally it only reports
computer startup/shutdown events and successful log sequence checkpoints:
”… Thread X advanced to log sequence Y …”
IM v5.0 c:\oracle\admin\adva\bdump\alert_adva.log
IM v5.1 c:\oracle\diag\rdbms\adva\adva\trace\alert_adva.log
14.3 History configuration
14.3.1 System 800xA  IM synchronization test

Note: This feature is only available from SV 5.0 SP1 and forward
Browse to the Inform IT History Control aspect on the Inform IT History Object in the
Node Administration Structure or launch the InformIT History Manager tool from the Task
Bar.
Select the MaintenanceSynchronization tab

The IM log configuration synchronization tool

Execute the Check Names… and Check Synchronization… functions to verify that the
IM log configuration is synchronized with the Log Configuration aspects in System 800xA.
Expected: No errors found – “Log scan completed successfully” in both tests.
Hint: each time a trend presentation attribute (min, max, unit, fraction) is changed in the
process, or is statically redefined (overridden) in the 800xA system the IM will require a
manual synchronization to adopt the new values. This is considered “normal operation”.
14.3.2 IM log database consistency test #1

In normal cases the IM should “slave” to the Log Configuration aspects made in 800xA
framework. The synchronization is maintained by a synchronizing service (IM History in
Service Structure). The synchronization can be verified with a special tool available at an
elevated (Run as administrator) Command Prompt: hsdbmaint
C:\> hsdbmaint –checkDB
hsDeleteForNonExistentAspectConfigs:
0 logs with no property logs in AIP need to be deleted.
...
The expected result is: Message logs should be reported with valid constraints and
indexes and a list with “0 logs … need to be deleted”.
If erroneous logs are found the configuration should be verified and possibly cleaned
(hsdbmaint –clean with IM services stopped from the PAS tool). Indexes can be
manipulated using the hsdbmaint tool. Refer to the IM User’s Guide for details.

14.3.3 IM log database consistency test #2

If the previous test passes but there are logs that cannot collect data, or be viewed from
trend displays, etc. this second check may reveal (and correct) additional problems with
the configuration (remember to elevate the Command Prompt by “run as administrator”)
C:\> set HS_SUPPORT=y
C:\> hsdbmaint –CheckItemIDs –l
Connecting to oracle...
Checking log itemIDs...
Done checking item IDs:
--------------------------------------
Total logs: 19
Logs with mismatched itemIDs: 0
If mismatched logs are found, they should be fixed with the “-f” option.
14.3.4 Entry Tables report

In some cases, it is desired to get a listing of all what is logged, how many points, quality
of collected data, oldest sample, etc.
To create such a listing, it is possible to run an Entry Tables Report from an elevated
Command Propmpt:
C:\> hsdbmaint –report > report.txt
…
Note: The output file (report.txt) may take several minutes to produce (or more if the
amount of store data is significant).
The report will contain one row for each log (including hierarchical logs) including
information about log state (active/inactive), time of first sample in log, time of last sample
in log, number of rows, number of bad rows, number of no data rows, etc.
S Start Time End Time Rows GoodDt BadDt NoDt log ID Log Name
A 14 May 08 04:55:45 18 Jun 08 11:29:45 178699 178687 12 0 20 $HS51-APPLOAD,VALUE-1-o
A 28 Aug 07 16:54:00 18 Jun 08 10:53:00 424440 424345 95 0 53 $HSTilluftkylning,Out.Value-1-o
A 28 Aug 07 16:54:00 18 Jun 08 10:58:00 424445 424383 62 0 57 $HSLabmedeltemp,Out.Value-1-o
…
|-------------------------------------------------------|
| 41 Active Logs 6 Inactive Logs |
| ----------------- --------------------|
| Good Values 37841283 41502 |
|badData Values 2574 18 |
|noData Values 0 0 |
|Percent Good 99.9932 99.9566 |
|-------------------------------------------------------|
|Logs With Errors: 0 |
|-------------------------------------------------------|
Expected: Total:
Percent Good near 100% (or else, find root causes for bad/missing data)
Logs With Errors = 0 (or else, perform database/synchronization check)
For each log:
End time = close to present time (or else log is not collecting data)
Ratio of BadDt (Bad data) / GoodDt (Good data) close to 0 (=100% good)
Ratio of NoDt (No data) / GoodDt (Good data) close to 0 (=100% good)
To ease the analysis, the report can be imported to Excel and the ratios be calculated with
formulas and the result be sorted with % bad/missing data in descending order.
14.3.5 Collection performance check and tuning

Per default, an IM is collecting data from Basic History on a timed schedule called Sample
Blocking Rate, SBR. SBR can be set at log creation time, or left at default value. Unless
the SBR is manually tuned, the collection of all logs having the same sample rate (and

thus also SBR) will take place at the same time. This is known to cause temporary high
load in Basic History especially in larger configurations. To make better use of system
resources, it is recommended to stagger the collection of data, i.e. to split large blocks into
several smaller pieces and spread them over time.
To view the current collection queue and block sizes, issue this command at a Command
Prompt:
C:\Users\800xaadmin> rtstest -p 3
rtsMain.c @ 1074 | sending 17 to 3
Next, open the output file (c*.stats) at the %HS_TMP% folder with e.g. notepad:
C:\Users\800xaadmin> notepad %HS_TMP%\c1.stats
Skip down to the “SECONDARY LOG QUEUE STATS” section and examine the table of
next upcoming data collections.
====== ===== === === ===== ========== ========================
#Entrs #Logs Ack Msd Fails B-Rate Next Collection Time
------ ----- --- --- ----- ---------- ------------------------
The table starts here…
If this table is long and having only small figures in the leftmost column (#Entrs), the
configuration is well spread over time and no further adjustments are necessary. The
remaining part of this check item can be skipped.
The effect can sometimes be seen on the LogMgrQueueLength property (see chapter 8.5
Basic History service health)
On the other hand, if the table is only few rows long and having large numbers in the
leftmost column, the configuration is NOT staggered and should be addressed. Run the
hsdbmaint command and select item 8 See example below
C:\> hsdbmaint
Then select the item Stagger Collection of data to improve performance

Or invoke the stagger function immediately from the prompt (needs elevation):
C:\> hsdbmaint –stagger
-----------------------------------------------------------------------------
Collection Info Per Controller: ( Rates are in units per minute )
-----------------------------------------------------------------------------
Ctrl: Dev Sub # Logs Sample Rate Request Rate
----------- --- --- ------ ----------- ------------
0(OPCHDA) 0 0 46 728.0 2.1
-----------------------------------------------------------------------------
Stagger Summary Information: ( Sample/Storage/Blocking units are seconds )
-----------------------------------------------------------------------------
Total(Type) Time Sample Storage Blocking Range AvgRate
------------------ ------------------ ------ ------- -------------- ---------
100(OPC HDA) 01 Jan 90 00:00:00 1 1 60/60 100.00
1(OPC HDA) 01 Jan 90 00:00:00 2 2 300/300 0.20
21(OPC HDA) 01 Jan 90 00:00:00 15 15 1800/1800 0.70
14(OPC HDA) 01 Jan 90 00:00:00 60 60 3600/3600 0.23
------------------ ------------------ ------ ------- -------------- ---------
Average Requests Per Minute from TTD/PHL: 0.00
Average Requests Per Minute from OPC HDA: 101.13
Average Requests Per Minute to hsStorage: 101.13
-----------------------------------------------------------------------------
Do you wish to continue? [yn] y

>>>>>>>> SKIPPING because elements are fewer than 5 <<<<<<
Default Summary:
100(OPC HDA) logs with blocking 4:00( 240 points ), stagger = 8
Average requests per minute from TTD/PHL: 0.00
Average requests per minute from OPC HDA: 13.43
Average requests per minute to hsStorage: 13.43

Do you wish to use the defaults? [yn] y
Do you wish to stagger NOW? [yn] y
As shown in the above example, the average requests per minute from OPC HDA were
redunced from 100 down to 13 per minute. This since the SBR for the 1s logs were
adjusted from 60 to 240 seconds. The “stagger = x” number indicates how many
smaller pieces the larger blocks of logs were broken down into.
In some cases, it may become necessary to enter other settings for SBR than what the
tool suggests, i.e. if collection cannot be delayed too much (which, is exactly what the
SBR is doing).
Inserting alternate values into the stagger tool requires some level of experience and
training of using the hsDBMaint tool.
Note: To activate the new settings, the IM must be rebooted or the IM History services
be restarted with the PAS tool (StartRun… pasgui)  Stop All  Start All.
14.3.6 History Backup

Verify that proper IM backups are maintained and that the most recent backup did not
return any error code. The exact location may vary as no default is assumed.
Unless overwritten (for some reason…) the last run of the IM backup can be viewed at
%HS_LOG%\historyBackup.log. Take notice of the timestamp of the log file itself:
…
Database backed up by exp utility
…
Export terminated successfully without warnings.
…
Files backup up by hsZip utility
…
Successfully zipped all files on drive C
…
Look for error messages other than the above indicating a successful backup. Notice, any
remaining item(s) from the IM log database consistency tests will tarnish the backup.
The IM Backup tool can be invoked from the IM System Tray icon.
The History Backup program (hsBAR.exe) can also be scheduled using the Scheduler
Service. Refer to Information Manager Configuration User’s Guide, 3BUF001092.

15 VMware ESX - Virtual Environment

To perform the following health check steps, (super user) root access to all the Hosts
(ESX/ESXi servers) in the environment using the vSphere Client is required. If vCenter
server is present in the environment, you can log in directly to the vCenter using an
administrative account and perform all the following steps from there.
15.1 Software version

Query ABB and VMware and web sites (or other sources of information)
about the currently supported versions of ESXi.
System 800xA: Third Party Software Versions, 3BUA000500
VMware: https://kb.vmware.com/s/article/2143832
Expected:
1) All hosts should run the same and appropriate version of ESXi.
2) Essential patches (builds) should be installed.
(download the most recent ESXi release notes and verify)
3) In some cases, hardware specific patches must be applied.
(e.g. ESXi v6.0 have problems with early LSI disk drivers)
15.2 VMware Tools

Expected:
All Virtual Machines have VMware tools installed and running in the latest
version. The columns indicating the status of the VMware Tools service
may have to be manually added.

15.3 CPU count

Expected: The number of virtual CPU cores allocated by running virtual machines
must not exceed the total number of Logical Processors (i.e. counting
HyperThreaded cores). Some reserve capacity should be present.

15.4 RAM size

Expected: The sum of all virtual machine’s memory setting must not exceed the
capacity of the ESXi host. Some reserve capacity should be present.


15.5 Network
It is recommended to restrict each virtual switch to one single port group only, and assign
it with a dedicated physical adapter (1:1).
If the number of virtual networks are large and the number of available physical adapters
low, the total number of required physical adapters can be reduced by using VLAN
tagging and truncing the traffic over the same physical connection.
Make sure to assign dedicated physical adapters for RNRP areas with redundant
connection. The primary (path0) and secondary (path1) connections must not share same
physical adapter.

15.6 Virtual Network Adapter Types

Over time and hence also versions of ESXi, the set of available virtual network adapter
types has varied.
The adapter types suitable for a given version of System 800xA is listed in the designated
System 800xA Virtualization with VMware vSphere ESXi, 3BSE056141-xxx document.
Some combinations of adapter type and OS version are known to cause issues.
ESXi System OS version Recommended adapter type and comments
version version
<= 5.1 <= 5.1 Server 2003 E1000 is required to prevent issues with RNRP.
Server 2008
However, it is suspected that later ESXi builds
Windows XP
(patches) has resolved the problem.
Windows 7
5.1, 5.5 6.0 Server 2012 Any of E1000, E1000E* or VMXNET 3 can be used.
6.0.1 Windows 8
There is a potential risk of Purple Screen of Death
6.0.2
(PSOD) with Windows 2012 Server and E1000
6.0.3
unless ESXi 5.5 Update 2 or later is installed.
5.1, 5.5, 6.0.3-1 Windows 10 Either E1000E* or VMXNET 3 (=avoid E1000)
6.0
5.1, 5.5, 6.0.3-2 Server 2016 Either E1000E* or VMXNET 3 (=avoid E1000)
6.0 Windows 10
*) For systems with 800xA for DCI, E1000E must be avoided.
Expected:
All network adapters are of the required type.
15.7 Time Synchronization

The ESX/ESXi hosts should track the same time/reference clock as the computers in the
800xA system are syncing with.
To properly verify accuracy (as the GUI is only displaying whole minutes), login with SSH
and issue the command: watch ntpq -p
# watch ntpq -p
Every 2s: ntpq -p 2018-08-09 11:40:44
remote refid st t when poll reach delay offset jitter
*172.16.4.254 10.51.24.39 5 u 630 1024 377 0.236 2.566 2.828
The offset and jitter indicate the deviation (in milliseconds). The currently in use reference
is marked with an asterisk (*). The value of reach indicates the result of the last eight (8)
consecutive polls. 377 (octal) = 255 (decimal) = 11111111 (binary) = 100% successes.

15.8 Automatic shutdown / startup of guests

To avoid abrupt stops (Power Off) and not started virtual machines when a host is
rebooted it is recommended to configure the Virtual Machine Startup and Shutdown
behavior.
Expected: All servers have “Shutdown Action = Guest Shutdown”
(not “Power Off Guest”)
Optional: Essential servers are put in an appropriate & automated startup sequence.
ESXi  Configuration  Virtual Machine Startup/Shutdown
The individual startup order and delay between each virtual machine may need to be
tested & evaluated a few times until found appropriate and working.
• A domain controller might need a few minutes extra before it start accepting logins
in case it is the primary domain controller (e.g. measure the time from boot until
the “Press Ctrl+Alt+Del to sign in.” message is presented on the console).
• Once the first aspect server has started and the System 800xA services have
reached “Service” state the remaining servers can be started in rapid succession.

15.9 Snapshots
Snapshots is a powerful feature of VMware, but combined with Thin Provisioning and
limited disk space snapshots can cause severe problems when disk space run out.
Expected: Snapshots and thin provisioning should be avoided in production
environments.
List of snapshots for a virtual machine
Thin Provisioning in disk settings for a virtual machine
Free disk space in ESXi datastore
In large configurations, snapshots can be listed via CLI (SSH Service must run in ESXi)
1. List all virtual machines
# vim-cmd vmsvc/getallvms
Vmid Name File Guest OS Version
19 DC1 [datastore2] Servers/DC1.vmx windows8Server64Guest vmx-08
22 AS1 [datastore2] Servers/AS1.vmx windows8Server64Guest vmx-08
24 CS1 [datastore2] Servers/CS1.vmx windows8Server64Guest vmx-08
2. List snapshots (per virtual machine ID, vmid)

# vim-cmd vmsvc/snapshot.get 24
|-ROOT
--Snapshot Name : CleanMachine1
--Snapshot Id : 1
--Snapshot Desciption : After network configuration, prior to Smart Client
--Snapshot Created On : 9/2/2018 13:10:25
--Snapshot State : powered on
--|-CHILD
----Snapshot Name : Win2016Svr
----Snapshot Id : 2
----Snapshot Desciption : Joined domain
----Snapshot Created On : 9/3/2018 8:10:56
----Snapshot State : powered on
3. A one-liner to list all snapshots for an entire host (enter all text on a single row)
# for vm in $(vim-cmd vmsvc/getallvms | cut -d ' ' -f 1 | grep -v Vmid); do
echo -n $vm: ; vim-cmd vmsvc/snapshot.get $vm; done
It’s even possible to remove snapshots via CLI, just substitute snapshot.get with
snapshot.removeall in the above examples. Be careful when deleting!

Batch Management
To be defined later.
16 Asset Optimization
17 800xA for Harmony

18 800xA for Melody

19 800xA for MOD 300

20 800xA for IEC61850

21 800xA for DCI

22 800xA for Freelance


23 Script reference
The following section describes a number of tests that can be performed system wide
using scripts. In some situations it may not be possible to run scripts, e.g. due to security
policies enforced by the customer. In such cases manual checking may be required to
complete the Health Check Test Record.
Obtain a copy of the scripts, login to Windows using an administrator account
(membership of the DomainAdmins group is recommended) and unpack the scripts in a
temporary folder somewhere on the local computer’s hard disk.
All the scripts will attempt to automatically populate the \Results\_Nodes.txt file with
all computers found in the workgroup or domain. The script is using NetBIOS technique
(net view) and if NetBIOS is not available/properly working some or all nodes may fail to
be added to the _Nodes.txt. In such cases, attempt to check/repair NetBIOS settings,
possibly reboot the computer and try again. As a last resort, configure the _Nodes.txt
file manually – insert one row per computer listing its name (or IP address).
Be sure that the _Nodes.txt file is correct (does not list any non-existing nodes, etc.) as
this may cause unexpected timeouts or errors during script execution.
Note: Before starting any script, ensure that Excel’s Macro Security Level is set
to Medium or Low.
Macro security set to Medium.

The scripts will write their test results to the \Results folder.
More details about warning thresholds, etc. are stored as Cell Comments in the result
spreadsheets.
Hover with the mouse pointer above the cell to display the comment.
Example of a cell comment

23.1 Run all scripts

Utility_RunAllScripts
This is a “master” script that sequentially calls all the other scripts.
It is possible edit this script to exclude a check if it is decided to be inappropriate,
e.g. test #4 (CHKDSK.EXE) is known to report errors that can’t repaired.
To exclude a check, insert a “REM” keyword at the beginning of a row:
REM call Show_Checkdisk
23.2 AC800M network throughput test

Show_AC800MSpeed
This script uses the RNRP Utility tool to perform a Network to node throughput test.
Example of output from a test

Controller nodes with low throughput should possibly be investigated further.
 Check controller’s Cyclic and Total System load
 Check MMS transactions per second
 Check if network switch port is reporting high volume of CRC, errors, etc.
23.3 Aspect Directory synchronization and structure consistency test

Show_AdConsistency
This script performs two checks using System 800xA standard tools and delivers output to
a textual file: AdConsistency_xx.txt. Review the text file
23.3.1 Aspect Directory checksum calculation using “afwsysinfo.exe –csd”

Verify that “No differences found” is output.
The tool also output some additional statistical information, e.g. the number of
objects and aspects in the system and the amount of disk space occupied by the
Aspect Directory. Record the figures in the Health Check protocol for future
references. Note: the Tag Count is not correct when using AC 400 Connect.
Aspect Directory Database Replication Check with "AfwSysinfo.exe":
-----------------------------------------------------------
No differences found
Server: KVVASCS2
Time: 01/25/11 10:44:15
System info: Test System

---------------------------------------------------------
Number of objects: 23065

Number of aspects: 160900
Number of versions: 160900
Number of categories: 7697
External count: 3667 (unused: 2466)
Internal count: 4030 (unused: 2272)
Blob bytes in AD: 215.89 Mbyte
File data size: 444.14 MByte
Tag count = 4733

DB Free portion = 43%
23.3.2 Structure Consistency check using “afwsct.exe”

Verify that no structure is reported with structural damages.
Aspect Directory Database Consistency Check with "Afwsct.exe":
-----------------------------------------------------------
Checking structure 'Workplace Structure'
Checking structure 'User Structure'
Checking structure 'Obsolete Structure'
Checking structure 'Aspect System Structure'
Checking structure 'Graphics Structure'
Checking structure 'Object Type Structure'
Checking structure 'System Structure'
Checking structure 'Service Structure'
Checking structure 'Product Type Structure'
Checking structure 'Node Administration Structure'
Checking structure 'Product Structure'
Checking structure 'Library Structure'
Checking structure 'Documentation Structure'
Checking structure 'Maintenance Structure'
Checking structure 'Scheduling Structure'
Checking structure 'Functional Structure'
Checking structure 'Location Structure'
Checking structure 'Control Structure'
Checking structure 'Admin Structure'
SCT succeeded
23.4 File system integrity test (CHKDSK.EXE)

Show_Checkdisk
This script performs non-intrusive error checking on all disk partitions on all computers
using CHKDSK.EXE.
Errors are reported in red.

23.5 File system test

Show_Disks
This script examines size, usage and file fragmentation on all disk partitions on all
computers.
23.6 Device driver version test

Show_Drivers
This script lists all device drivers and versions for all computers.
This script can provide useful information when e.g. investigating display driver
related issues, etc.
23.7 Melody Connect log file test

Show_MelodySysErrLog
This script collects the Melody Connect syslog_err.log and syslog_err.old.log
file from Melody Connect servers.
23.8 Computer memory test

Show_Memory
This script collects summary memory usage statistics for each computer.

Follow up on errors and warnings by looking at individual processes (test #16)

and/or use tools in Microsoft Windows, e.g. the Task Manager, Performance
Monitor or Resource Monitor (available in Windows 7/2008 Server only).
23.9 Microsoft network setting integrity test (NETDIAG.EXE)

Show_NetDiag
This script executes the Microsoft NETDIAG.EXE test on each computer.
23.10 Network settings test

Show_NetSettings
This script collects miscellaneous network settings (bind order, DNS client, NetBIOS, etc.)
from all computers.
23.11 Network bandwidth test

Show_NetSpeed
This script uses a file transfer to measure the network speed between the computer
where the scripts are executed at and all other computers.

Performance may be influenced by disk Write Cache being turned off

(recommended setting is ON).
23.12 System locale setting test

Show_NlsSystemInfo
This script tests the Operating System locale (Regional Setting) in all computers.
23.13 User locale setting test

Show_NlsUserInfo
This script tests the user locale (Regional Setting) of all users in all computers

23.14 DNS check using NSLOOKUP.EXE

Show_NsLookup
This script uses NSLOOKUP.EXE to verify that all computers can resolve all other peers
through DNS.
23.15 Time Synchronization configuration test

Show_Ntp
This script retrieves time synchronization configuration details from all computers.
23.16 Running processes test

Show_Processes
This script collects memory statistics from all running processes in all nodes.
Enable the Auto-Filter feature in MS Excel and sort contents descending.

Only a few processes should exceed 1 GB, e.g. sqlserver.exe and oracle.exe
which both can be very close to 2 GB without it being a problem.
23.17 Registry size test

Show_RegistrySizes

This script collects the Windows registry size from all computers.
23.18 Running services test

Show_Services
This script collects a list of all services and their current state from all computers.
23.19 System Identifier (SID) test

Show_Sid
This script verifies that all System Identifiers are unique. Domain controllers should have
the same SID. Identical SID numbers in other machines indicate cloning and possible
license violations with Microsoft (System 800xA does not have any issues with duplicated
SIDs).
23.20 Automatically started programs test

Show_Startup

This script collects a list of automatically started applications from all nodes. Some
applications are less desired in Process Control equipment.
23.21 Mandatory System 800xA Third Party Software test

Show_ThirdPartySW
This script verifies that all mandatory System 800xA 3rd party software is in place in all
computers (System 800xA 5.1, 5.0, 4.x, 3.1 Third Party Software, 3BUA000500)
23.22 Time synchronization test

Show_TimeRead
This script reads the current date and time from all computers.
23.23 Computer uptime test

Show_UpTime
This script collects the system uptime from all computers

23.24 Windows Event Log test

Show_WinEvents
It is recommended to run this script manually since the default settings currently excludes
Information level events and only collects events since last reboot. Fetch all Information,
Warning and Error level events from a period in time, e.g. 3 month, 6 months, etc.
depending on how the system has been running in the past.
Merge all events into one Excel spreadsheet to get an overview. The Auto-Sort
feature in Excel, especially in the enhanced one in later versions (Office 2007 and
2010) makes it very easy to work through the events and “hide” complete sections
of similar events as they have been identified/handled.
Using Auto-Sort feature in Excel to assist investigating numerous of events
23.25 Conversion tool .CSV  .XLS

Utility_Convert-Csv-Xls
This script can be used to convert .CSV files into .XLS files if the target machine where
the scripts were executed at lacked Microsoft Excel.

The .CSV .XLS conversion tool
23.26 AC800 MMS statistics test

Utility_OPC_MMS
This script gathers the AC800 OPC Server’s Variable Communication Statistics
information into one Excel spreadsheet.

REVISION
Rev. ind. Page (P) Description Date

Chapt. (C) Dept./Init.
A draft All Adapted to TTT template Jan 2008
S Strömqvist
B draft Updated with feedback from P Petersen, S Hissbach and P May 2008
Larsson during the System Check meeting in March in Västerås S Strömqvist
C draft Updated after 1st and 2nd E144 – System 800xA Health Check Jun 2008
pilot workshops S Strömqvist
D draft All Minor adjustments. Aug 2009
Updated to track E144 chapter layout S Strömqvist
Updated with PLC Connect chapter
E draft All Redesigned to better fit a health check assisted by scripts May 2011
Updated with 800xA System Version 5.1 S Strömqvist
F draft All Minor adjustments Aug 2013

C7, C8 PG2 (New Graphics) check. Alarm Manager storage settings. S Strömqvist
C11 Updated G Ascard
C15 VMWare ESX introduced as Appendix 15 S Balakrishnan
G draft 3 All Minor adjustments (v6 items added) Aug 2018
C15 VMware ESXi reworked S Strömqvist
G draft 4 C12 Minor adjustments on TSTM and MasterAdapter statistics Mar 2019
S Strömqvist

D - ABB-System Health PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

D - ABB-System Health PDF

Încărcat de

Drepturi de autor:

Formate disponibile

System 800xA Health Check

Doc. Id. PA-SE-XA-006561 Rev. Gd4 Mar 2019

8.8 Windows Firewall ....................................................................................... 53

13.1.3 Communication Server – DriverInfo .............................................. 79

23.15 Time Synchronization configuration test ................................................... 100

1.1 ABB Service Products Data Collector (SPDC)

1.2 Scripted tests (legacy)

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 1

Script name Description Comment

1.3 Executing the scripts (legacy)

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 2

2 Software version check

DCT Software software check

System Checker Tool software check

The standalone version of System Checker Tool is compatible with all

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 3

Version comparison made with the System Checker Tool

Version comparison made with DCT

2.1 Microsoft software check

ABB AB PA-SE-XA-006561 en Gd4 4

2.2 ABB software check

2.3 System Extensions check

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 5

2.4 Configure System task check (only applicable in SV 6.0)

Expected: All nodes are listed with Installation Status = Deployed

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 6

3 Computer hardware check

4 Network hardware check

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 7

4.1 RNRP check

4.1.1 RNRP Network Status Tool (SV 6.0)

Expected: All nodes visible with (“Up”) on all configured paths

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 8

4.1.2 RNRP Monitor

The RNRP Monitor icon in the System Tray

The RNRP Monitor

4.1.3 RNRP Fault Tracer

The RNRP Fault Tracer tool

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 9

4.1.4 RNRP Log

The RNRP Create Icon tool

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 10

4.1.5 The “hosts” file

The DNS and hosts file is compared “side-by-side”

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 11

4.1.6 RNRP Response Time check

4.2 System 800xA System Network Settings check

Configuration Wizard in SV 5 – System Network settings

Doc. no. Lang. Rev. ind. Page

ABB AB PA-SE-XA-006561 en Gd4 12

4.3 Network Adapter Bind Order vs Network Metric

Server 2016 and (later) Windows 10 Previous versions

Manual metric (governed by RNRP) Automatic metric (default)