Documente Academic
Documente Profesional
Documente Cultură
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
FUNCTION CHANGE
Abstract
Application
Contents
1 GENERAL 3
1.1 SYSTEM ENVIRONMENT 3
1.2 REQUIREMENT REFERENCES 3
1.3 DESIGN PRINCIPLES 3
1.4 HARDWARE AND SYSTEM SOFTWARE 4
2 EXTERNAL INTERFACES 4
2.1 PROVIDED EXTERNAL INTERFACES 4
2.2 USED EXTERNAL INTERFACES 4
3 USE CASES 5
3.1 INITIATING A FCH SESSION 5
3.2 ENDING A SESSION 22
4 STRUCTURE 38
4.1 RESPONSIBILITIES 38
4.2 INTERFACES 38
4/002 01-CAL 119 0401 Uen A
5 SOFTWARE UNITS 38
5.1 FCHEXESRC 38
5.2 FCHLIBSRC 40
6 PROCESSES 41
7 PERSISTENT STORAGE 41
8 ERROR HANDLING 42
DESIGN SPECIFICATION 2(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
9 FUNCTION CHANGE 42
10 START, STOP AND RESTART 42
11 CONFIGURATION 42
12 CAPACITY 43
12.1 DATA FOR CAPACITY ESTIMATION 43
12.2 CAPACITY ESTIMATION 43
13 SPECIAL FEATURES 43
14 REFERENCES 43
15 ANNEXES 44
15.1 ANNEX REVISION HISTORY 44
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 3(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
1 GENERAL
This document describes how to provide Function Change, i.e. online soft-
ware upgrade of AP software with minimum system downtime and main-
tained high availability.
The internal design of FCH is object oriented, whereas the command line
interfaces used to execute a Function Change session are procedural.
Everything is written in C++.
DESIGN SPECIFICATION 4(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2 EXTERNAL INTERFACES
2.1.1 fchstart
2.1.2 fchfb
2.1.3 fchcommit
2.1.4 fchend
2.1.5 fchrst
2000-11-15 B
FCH uses the prcconf CLI to reconfigure the MSCS cluster database and
the prcgen CLI to generate the PRC Service Start Schedule file.
FCH uses the phacreate CLI to syntax check parameter files and update the
PHA parameter database. FCH also uses the phatrans CLI to transfer
parameters between different versions of CXCs.
At present, FCH does not use BUR CLI directly to restore a single node. In
case of a restore, FCH prepares the system for restore and the operator has
to invoke the needed BUR command manually. I.e. fcc_restore or Burre-
store.
FCH uses the MSCS API to control cluster resources and to perform
failovers.
2000-11-15 B
3 USE CASES
Note that the different fchstart options described below can be combined in
any way.
3.1.1.1 Description
Install and upgrade of CXC packages is initiated with the fchstart command
using the -d option and a directory as argument. The directory must contain
the CXC packages to be installed, in the form of self-extracting Winzip
files. The command will unpack the packages to a specific location, check
to see if the CXC packages are previously installed, and if so check the revi-
sion of the installed package and compare it with the revision of the
package to be installed. All unique CXC packages, i.e. the revisions that are
not already installed, will be displayed as a list from which the operator
may select which CXC’s to install.
All existing rin files with it’s corresponding instances are removed from the
system before CXC package updates. After installation, the rin files that
should exist in the new system are added again. There are 2 ways to add or
update rin-files: by using the -i option or installation using cxc installation.
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state with state set to
noFCH.
If the operator selects y for yes, state is set to Reboot, the node will be
rebooted, after which the FCH server component will perform the switch
over to the new system with state initially set to Failover1 etc.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 7(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
ok
prompt (n)
answer (o)
set state Reboot and
initiate reboot (p)
ok
ok
DESIGN SPECIFICATION 8(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
3.1.2.1 Description
Delete of CXC packages is initiated with the fchstart command using the -
r option. All installed CXC’s will be displayed in a list from which the oper-
ator may select CXC’s to delete from the system.
All existing rin files with it’s corresponding instances are removed from the
system before CXC package updates. After installation, the rin files that
should exist in the new system are added again. There are 2 ways to add or
update rin-files: by using the -i option or installation using cxc installation.
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.
If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
Note that this powerful option can really make a system unusable.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 10(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
answer (n)
initiate reboot (o)
ok
ok
DESIGN SPECIFICATION 11(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
3.1.3.1 Description
Edit of CXC parameters is initiated with the fchstart command using the -
p option. All installed CXC’s with parameters will be displayed in a list
from which the operator may select CXC’s which parameters he wants to
edit. The parameter file of the selected CXC will be opened in a text editor
where the operator may perform edits of parameter values. The edited
parameter files are checked for syntax errors.
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.
If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 13(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
2000-11-15 B
2000-11-15 B
3.1.4.1 Description
Update of CXC parameters is initiated with the fchstart command using the
-P option with a file as an argument. A The file must contain a full-path list
of files to replace the existing CXC13NNNN.par files. The original files are
backed up and then replaced with the new files.
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.
If the operator selects y for yes, state is set to Reboot and the node will be
rebooted, after which the FCH server component will perform the switch
over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 16(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
answer (n)
initiate reboot (o)
ok
ok
DESIGN SPECIFICATION 17(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
3.1.5.1 Description
Replacing of LBB file, i.e. any file in the system, is initiated with the
fchstart command using the -l option and a file as argument. The file must
contain a list of files to replace and the files to replace them, each file pair
separated by a semi-colon. If the file to replace does not exist, it is assumed
that it’s a new file. The original files are backed up and then replaced with
the new files. If the original file didn’t exist, an empty file is created as
backup. The empty file will then later be removed should fallback have
happened.
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.
If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 19(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
amount of free disk space and that vaild BUR backup files
exist. It also makes backup copies of FCH and ACS binaries.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) is in progress, that the cluster quorum
resource is available, that the current node is the passive node,
DESIGN SPECIFICATION 20(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
that none of the nodes are paused, that all cluster resources are
online and that the FCH special directories are empty. It also
flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH initiates package handling, initiating the FCH package
transaction log.
f FCH makes backup copies of the files to be replaced, and then
replaces them.
g FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
h FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
i The operator confirms or denies. If the operator denies, the
FCH session is aborted.
j FCH sends an event specifying that this is a controlled FCH
reboot, initiates a reboot of the system using the prcboot
command and exits.
3.1.6.1 Description
2000-11-15 B
If the file to replace does not exist, it is assumed a new file. The original
files are backed up and then replaced with the new files. If the original file
didn’t exist, an empty file is created as backup. The empty file will then
later be removed should fallback have happened.
All existing rin files with it’s corresponding instances are removed from the
system before CXC package updates. After installation, the rin files that
should exist in the new system are added again. There are 2 ways to add or
update rin-files: by using the -i option or installation using cxc installation.
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.
If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 22(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
replace LBB files (f)
ok
exec prcgen & prcconf
ok
verify before switch (g)
ok
prompt (h)
answer (i)
initiate reboot (j)
ok
ok
2000-11-15 B
that none of the nodes are paused, that all cluster resources are
online and that the FCH special directories are empty. It also
flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH clears all resource instances from the service control
database together with the corresponding resource instance
files. FCH makes backup copies of the files to be replaced,
and then replaces them.
f prcgen is executed to create the PRC_Config file with the
updated resource instances. prcconf is run to create the new
cluster database.
g FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
h FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
i The operator confirms or denies. If the operator denies, the
FCH session is aborted.
j FCH sends an event specifying that this is a controlled FCH
reboot, initiates a reboot of the system using the prcboot
command and exits.
3.1.7
3.1.8.1 Description
Upgrade of LBB and 3pp software is initiated with the fchstart command
using the -L option. After initial checking a LBBShell> command prompt
is displayed, and the operator may enter commands and execute upgrade
packages. Reboots are also permitted during this phase.
DESIGN SPECIFICATION 24(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node must be restored using the fchrst component.
If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 25(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
ok
check resources (c)
ok
initiate session (d)
ok
2000-11-15 B
3.1.9.1 Description
Switching over to the new system, i.e. making the upgraded node the active
node to test the new configuration is handled by the FCH server component
after fchstart has performed a reboot. As the server component starts up
after a reboot, it checks if this was a reboot after a fchstart (Reboot state) ,
and if so proceeds to send events to both nodes about the successful FCH
DESIGN SPECIFICATION 27(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
reboot. FCH sends events about switchover attempt and goes into sleep for
(default) 5 minutes. FCH checks if cluster configuration change is necces-
sary and sets Failover1 state. All cluster resources are stopped, both nodes
are resumed and Move1 state is set. Common resource groups are moved
to the modified node. If configuration change is neccessary, Config1 state
is set and current configuration is deleted, state is set Config1B and new
configuration is added. This node is resumed and other node paused. Super-
vision state is set and FCH starts all the resources on upgraded node.
Current node is paused and other resumed. Other node resource are started.
Then current node is resumed and other paused. Finally sends an event
saying that switchover was successful and that supervision has now begun.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 28(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
2000-11-15 B
Non-stopping errors are logged to the FCH log file. Stopping errors are
logged and an event will be sent reporting what error occurred. Examples
of stopping errors is resources failing to come online during switch over
and failed cluster database updates.
3.2.1.1 Description
2000-11-15 B
data disk is OK, that the quorum resource is available, that the
current node is the active node, that the current node is not
paused and that the other node is paused.
d FCH makes backup copies of the FCH and ACS binaries to
the other node. State is set to Committing.
DESIGN SPECIFICATION 32(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
3.2.2.1 Description
Fall back to the system that existed before the FCH session was started is
initiated with the fchfb command. This will raise the FCH failed alarm, stop
all resources, restore the cluster configuration, move the resource groups
back to the unmodified node and start the resources again, in essence
performing a reversed switch over.
After this, if no LBB/3pp software was upgraded, the command undoing all
changes made during the fchstart command, deleting new CXC’s, rein-
stalling old CXC and restoring modified files. Finally the node is rebooted.
If LBB/3pp software was upgraded, the command only restores the cluster
configuration and performs the reversed switchover. After this the operator
is prompted to execute fchrst, and the command exits.
After the fall back has been performed (or, in the case of LBB/3pp upgrade,
the restore) the session must be ended using the fchend component.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 33(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
ok
Add node cluster config (r)
ok
finish fallback & reboot (s)
ok ok
DESIGN SPECIFICATION 35(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
If the fchfb command fails, the fchrst command must be used to restore the
node. In extreme cases, where even the fchrst does not work, a full BUR
emergency restore of both nodes may be necessary.
3.2.3.1 Description
After this, the command reboots the node and the supervisor component
ends the session after the reboot.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 37(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
e FCH removes the current instances based on the rin files and
removes the rinfiles themselves. FCH replaces all LBB files
that were changed during fchstart on the other node.
f FCH installs, upgrades and removes the CXC packages that
were changed during fchstart, making the current node
software configuration identical to the other node.
Setupservices is run to setup random users for services.
DESIGN SPECIFICATION 38(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
g FCH updates the CXC parameter files and the PHA database
to the same status as on the other node. All instances basedon
current rinfiles are added.
h FCH rebuilds the PRC service start schedule file and checks
if the file has changed. In this use case it has changed. FCH
sets the FCH state to Config2, sends an event reporting that
the cluster database is about to be edited, deletes the old
cluster configuration, sends another edit event, sets the FCH
state to Config2B, adds the new cluster configuration and sets
the FCH state to EndInstallDone.
i FCH sends an event reporting that the switch attempt was
successfull, sets the FCH state to EndReboot, sends an event
reporting that a reboot is about to take place and reboots the
node.
If the fchend command fails, the fchrst command may be used to restore the
node. The fchcommit command may then be executed again, and then the
fchend command can be executed once more to upgrade the unmodified
node.
3.2.4.1 Description
After LBB upgrades have been finished, any other changes made during the
FCH session, i.e. CXC install, delete or parameter changes, will also be
performed. After this, the command reboots the node and the supervisor
component ends the session after the reboot.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 39(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
If the fchend command fails or the operator aborts the session, the fchrst
command may be used to restore the node. The fchcommit command may
then be executed again, and then the fchend command can be executed once
more to upgrade the unmodified node.
3.2.5.1 Description
If the session was ended with fchfb, the command performs a last clean up
and ends the session, thereby enabling a new session to be initited with the
fchstart command.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 41(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
Since this use case involves very few and quite simple operations, errors
rarely occurs. Should the command fail or be interrupted it can be executed
again.
4/002 01-CAL 119 0401 Uen A
2000-11-15 B
3.2.6.1 Description
FCH restore is initiated with the command fchrst. The backup image must
reside in the partitions for both nodes. The command first restores the
cluster configuration to its previous state and then proceeds to make the
unmodified node acticve.
Before the state CommitDone, the newly upgraded node will be restored, if
neccessary. From the state CommitDone and later, the old node will be
restored if neccessary.
After this, the command prints a message on the screen on how to proceed
with the restore using BUR, and then exits. Operator uses BUR to restore
the node from the backup image, and then reboots the node.
After the boot the fchend command must be used to end the session.
In this particaular use case a fchstart with LBB software upgrade has been
performed sucessfully, but the operator wants to fall back anyway. The
FCH state is Supervision and the operator initiates the command from the
passive (unmodified) node.
Note that the reason for that fchrst must start from a certain node lies in it’s
adaption to the old BUR. In the future it should be possible to start fchrst
from any node.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 43(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
2000-11-15 B
Should the fchrst command fail, a BUR emergency restore of both nodes
will be necessary.
3.2.7.1 Description
If the session was ended with a FCH restore, the fchend command makes
sure the cluster configuration is correct, does a final clean up and ends the
session, thereby enabling a new session to be started.
2000-11-15 B
resets the LBB upgrade keys in the registry and sets the FCH
state to noFCH.
g FCH deletes the old PRC service start schedule files on both
nodes and cleans up the FCH temporary directories.
DESIGN SPECIFICATION 46(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
This is similar to the case of ending a session after fall back, but there is a
critical point if the PRC service start schedule file has changed. If the
cluster configuration edit fails there is a risk of single point of failure where
you might be forced to do a BUR Restore.
3.2.8.1 Description
When the node has been restored, the fchcommit command can be executed
again as normally followed by the fchend command to re-commit the new
system.
In this use case the fchend command was interrupted in the middle of
installing CXC packages, the FCH state is EndInstalling. The operator
executes fchrst from the active node with the backup file for the passive
node as argument.
Note that fchrst is adapted to the old BUR and in the future it should be
possble to run from both nodes, not just one.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 47(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
Should the fchrst command fail, a BUR emergency restore of both nodes
will be necessary.
DESIGN SPECIFICATION 48(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
4 STRUCTURE
The block FCH consists of three software units, two CAA and one CXC, as
shown below.
FCH
CNZ 222 59
4.1 RESPONSIBILITIES
4.1.1 FCHEXESRC
This software unit implements the FCH user interface commands, internal
commands and supervisor component.
4.1.2 FCHLIBSRC
This software unit implements the FCH core functionality Dynamic Link
Library (DLL) used by the FCHEXESRC components.
4.2 INTERFACES
5 SOFTWARE UNITS
5.1 FCHEXESRC
5.1.1 Components
2000-11-15 B
5.1.1.1 ACS_FCH_Server
5.1.1.2 fchcommit
This component ends the supervision period and copies all necessary data,
such as CXC software packages and parameter files, to the other node to
prepare it for upgrade. LBB and 3pp software upgrades are not copied
however. They must be transferred manually by the operator.
5.1.1.3 fchend
This component has two functionalities, to upgrade the old node after a
commit, and to clean up after a FCH session that ended with fallback or
restore.
5.1.1.4 fchevent
4/002 01-CAL 119 0401 Uen A
This component is used to send event, raise alarms and cease alarms. All
event and alarm handling in FCH has been implemented in this component
to minimize dependencies between FCH and other ACS components such
as AEH.
DESIGN SPECIFICATION 50(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
5.1.1.5 fchfb
This component allows the FCH session to be aborted and reverts the
system to the configuration that existed before the FCH was initiated. In
case of a normal FCH of CXC software or parameters, it performs a
complete return to the old system. If LBB and 3pp software was upgraded,
it performs a switch over, i.e. it returns control of the system to the unmodi-
fied node, but does not restore software or parameters, which must be
handled by fchrst in this case.
5.1.1.6 fchrst
This component is used to restore one node using the BUR restore function-
ality. It is used to achieve fall back when LBB/3pp software has been
upgraded, to restore the old node if fchend fails in order to allow re-commit
of the new system, and generally to handle severe errors during a FCH
session where normal FCH functionality cannot restore the system - for
instance if a blue screen occurs during fchstart.
5.1.1.7 fchstart
This component is used to initiate a FCH session and to upgrade the first
node. It allows the operator to select CXC packages for install and delete,
edit CXC parameter files online or offline, add or replace LBB files, add or
replace resource instance files and upgrade LBB and 3pp software.
5.2 FCHLIBSRC
5.2.1 Classes
5.2.1.1 ACS_FCH_ClusterControl
This class implements the methods in FCH to control the MSCS. This
includes starting and stopping of resources, ordered failover, node and
resource status control, reconfiguration of the cluster database via PRC, and
pausing and resuming cluster nodes.
5.2.1.2 ACS_FCH_Common
4/002 01-CAL 119 0401 Uen A
This class is the base class and implements common functions used by all
the other classes, such as event reporting, activity and error logging and
various I/O and file handling functions.
5.2.1.3 ACS_FCH_Error
2000-11-15 B
5.2.1.4 ACS_FCH_Exception
5.2.1.5 ACS_FCH_LBBFiles
This class implements replacing of LBB files, i.e. arbitrary files in the
system. It has methods for backing up replacing a file, fall back and
commit.
5.2.1.6 ACS_FCH_Package
5.2.1.7 ACS_FCH_Parameter
This class implements editing of CXC parameter files. It has methods for
backup and edit of parameter files, syntax check, updating the PHA param-
eter database, fall back and commit.
5.2.1.8 lbbfile
5.2.1.9 ACS_FCH_Exception
5.2.1.10 rinUpdate
5.2.1.11 parfile
5.2.1.12 ACS_FCH_Time
6 PROCESSES
FCH does not implement any supervised processes, but require that all
supervised process are online. FCH also supervises the PRC Cluster
Control process during switch over, to make sure it is properly stopped and
started.
DESIGN SPECIFICATION 52(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
7 PERSISTENT STORAGE
m HKEY_LOCAL_MACHINE\Cluster\BOOTTIME_NODEN used
is to
establish when a boot occurred and how long time that has
passed since it.
n HKEY_LOCAL_MACHINE\Cluster\OLDBOOTTIME_NODEN co is
mpared with the previous value to verify that a new boot has
occurred or not.
o lopt is used to save the argument to -l (-i) and use it again on
the old other node to do the same LBB files update there.
DESIGN SPECIFICATION 53(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
8 ERROR HANDLING
All errors are logged in the FCH activity log file. Depending on the situa-
tion FCH tries to either perform the action again to attempt to bypass inter-
mittent errors, ignore the error and continue the FCH session if the error is
not serious, or if the error cannot be handled or bypassed printing an error
message on the console and exit.
9 FUNCTION CHANGE
NA
11 CONFIGURATION
12 CAPACITY
NA
13 SPECIAL FEATURES
NA
2000-11-15 B
will be moved to that node anyway with the groups offline. I.e. a group can
temporarily belong to a paused node but is offline. If a group can only
belong to one node and that node is down, it will have no owner.
The cluster database is really 2 registry databases which are equal or made
equal. On the data disk exists a change log.
Now, let’s say a FCH session has upgraded one of the nodes in the Cluster
including the cluster database. For example, a cluster resource has been
added. This resource belongs to the current node, let’s say. One could be
tempted to beleive that a restore of the upgraded node would revert the FCH
session to it’s previous state. This is, however not the case. The addition of
the cluster resource affects the database on both nodes. If one node is
restored, one of the “identical” databases will be different from the other.
In this case, the database with the latest timestamp will “win” and the oper-
ator will have the old original node with an upgraded database! The conse-
qvence of this is that the cluster database will need special handling to be
reverted back to it’s original state. When the cluster database has been
changed, FCH uses the PRC command prcconf to update or revert the data-
base. It should be obvious now that an failover with upgrade of database
needs the cluster resources on the executing node to be offline.
FCH uses states to keep track of what has been done, so it can properly fall-
back the system should a failure occur.
The use of states is extremely important when keeping track of cluster data-
base changes.
Among other things, FCH has always been a state machine. In this APG40
NT version, with a 2-node cluster, the states are more of a transaction log
where each state represents a set of actions and direction. In the APG30
DESIGN SPECIFICATION 55(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
After state CommitDone, only local fallback (INGO1 addition) can take
place of non-upgraded node (during upgrade attempt). Also, if newly
upgraded node fails, old non-upgraded node can be activated.
noFCH Committing
Move2
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 56(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
noFCH FbFailover2
Committing
Move2
Restore
To inititate the FCH session, the operator executes fchstart. The command
will upgrade the system. Then, fchstart sets Reboot state, reboots and
ACS_FCH_Server takes care of the remaining steps until Supervision.
Assume that the node A is beeing upgraded. fchstart will reboot the system
and ACS_FCH_Server will switchover and make the node A active after
reboot. The newly upgraded system is now active and beeing supervised by
the operator and ACS_PRC_ClusterControl..
Table 14.1
tem
2B LBBRe- This special state tells FCH fchstart that LBB is
boot1 beeing upgraded (fchstart -L) and that several con-
sequtive reboots can occur. The state is changed to
Installing when operator types “l” (leave) in the
LBB window.
DESIGN SPECIFICATION 57(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
Table 14.1
3. Reboot The state is set by fchstart before reboot of system
to activate the new software. Events are sent. After
reboot “reboot success” events are sent and state is
changed to Failover1.
4. Failover1 All cluster groups except of “Cluster Group” are
brought down. Both cluster nodes are resumed to
enable failover. (So-called MoveClusterGroup).
The cluster is now ready for an offline failover.
Online failover is not suitable for FCH since the
cluster database might be changed.
5. Move1 Move (failover) the cluster groups. Only the clus-
ter groups that has more than one node owner can
be failed over. If the owner for a cluster group was
non-upgraded node A, the new owner will be
upgraded node B. The current updated node is now
active with it’s system upgraded.
6. Config1 If old and new PRC_Config files are different, that
is, if cluster configuration has changed, delete
cluster database based on current old config file.
(Current configuration).
7. Config1B Create new cluster database based on new updated
config file. (New cluster resource configuraion).
8. Failover2 This state indicates that failover and possible con-
figuration change is done. Pause other node and
resume the current one.
9. Supervi- The current upgraded node is started. This node is
sion paused and other node is resumed. Other node is
started. The upgraded node is started first which is
somewhat more complex than the other way
around. The reason of why the more complex solu-
tion is used is improved “ISP”, in service perfor-
mance.
Both nodes are now started and the operator
should now observe the system in at least 2 hours.
He can choose to fallback using fchfb, commit the
AP using fchcommit or in the worst case, restore
using the fchrst command.
4/002 01-CAL 119 0401 Uen A
2000-11-15 B
In all states, up to, (but not included) CommitDone, the FCH session might
be reverted to the situation that existed prior to Function Change. This is
due to a failure, for example unexpected reboot, or operator intervention
using for example fchfb, Function Change fallback. FCH can be interrupted
in any state and a fallback will start. An alternative to FCH fallback is a
single node restore which is implemented by fchrst, Function Change
restore. fchrst is always needed when a LBB upgrade is needed.
2000-11-15 B
19. FbReboot. Set this state when FbFailover3 is ready. This state
is used when a calling function wants to give a
reboot order during fallback. Set FbReboot2 state.
20. FbReboot2 Set this state before reboot. Create reboot file to
communicate to PRC that reboot should not be
counted. Do the actual reboot. After reboot, send
some events, and cleanup.
21. End. Pause other node, resume the fallbacked node,
start up the services, pause this node and resume
other node again. Run fchend and ensure that fall-
backed node is executing again, cease alarm and
cleanup.
14.4.3 Misc exceptional flow of events, fallback from Installing or Failover1 state
2000-11-15 B
14.4.4 Normal flow of events until noFCH is set and FCH session is successful.
If fchcommit was successful and state CommitDone was set, the FCH
session has to be ended by installing the passive current node to make it
equal the active node.
22. Commit- This state is set after state Committing when a suc-
Done cessful fchcommit has been executed. The newly
upgraded node is active and has been approved by
the operator.
23. LBBRe- This special state is set if the user has upgraded the
boot2 LBB during the FCH session. The operator can do
several consequtive reboots to install drivers etc. It
is the operator’s responsibility that he follows the
procedures exactly as was done on the original
node.
24. EndInstall- This corresponds to the Installing state but on the
ing other node. FCH will automatically update the
system exactly as was done on the originally
updated node with the exception of LBB upgrades.
25. Config2 If cluster database was updated, the corresponding
changes for this node will be done online. In this
state, the old configuration will be deleted.
26. Config2B If cluster database was updated, the corresponding
changes for this node will be done online. In this
state, the new configuration will be added.
27. EndInstall- Installation of node is ready. Send events.
Done
28. EndReboot The system will be rebooted to activate the new
software. Events are sent.
29. EndReboot The system has successfully rebooted. Events are
Done sent and alarms ceased. State noFCH is set and
cleanup is performed.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 61(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
If the active, newly upgraded node becomes unavailable, the old node must
become the active one. A failover cannot be done right away, since the
cluster database has to be reverted, if neccessary. The states are bidirec-
tional..
Table 14.2
22. Commit- This state is set after state Committing when a suc-
Done cessful fchcommit has been executed. The newly
upgraded node is active and has been approved by
the operator.
30. InitWrong- Failure of active upgraded node has occurred and
Node check is done to see if cluster database change is
neccessary. Resume node to be able to move clus-
ter groups without current owner. Stop all groups
except for cluster group. Start cluster group if
offline. Resume both nodes. Failover to current
non-upgraded node
31. Config6 If cluster database has changed, delete the services
belonging to current node using new PRC_Config
file.
32. Config6B If cluster database has changed, add the resources
belonging to current node using old PRC_Config
file.
33. InitWrong- The switch to old non-upgraded node has been
NodeDone done. Resume this node, pause other node, ensure
that that Cluster group is online and start current
node.
34. End- The old node is up and running.
Wrong-
Node
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 62(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.
2000-11-15 B
CommitDone EndWrongNode
CommitDone EndWrongNode
If the non-upgraded node fails during upgrade, local fallback of this node
is neccessary. In the worst case, if the newly upgraded node fails during
upgrade of the inactive node, both a local fallback and a switchover to this
node might be neccessary.
If restore of old node is required, due to, for example, failed LBB upgrade,
4/002 01-CAL 119 0401 Uen A
2000-11-15 B
15 REFERENCES
2000-11-15 B
16 ANNEXES