Sunteți pe pagina 1din 64

DESIGN SPECIFICATION 1(64)

Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

QABKULD 102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

UAB/I/GBC (T Wocalewski) 2000-11-15 B

 Ericsson Development and Reseach Inc 2000

FUNCTION CHANGE

Abstract

This document describes the implementation of Function Change (FCH)


upgrade of ACS, ACS-based application software, Large Building Block
(LBB) and third party product (3pp) software as well as system parameter
editing on the Windows NT platform. The functionality described in this
document is the complete FCH.

Application

This document forms a base for implementation, user documentation, test


and maintenance of the product(s) related to FCH and is not intended for
the customer or the user of the system.

Contents

1 GENERAL 3
1.1 SYSTEM ENVIRONMENT 3
1.2 REQUIREMENT REFERENCES 3
1.3 DESIGN PRINCIPLES 3
1.4 HARDWARE AND SYSTEM SOFTWARE 4
2 EXTERNAL INTERFACES 4
2.1 PROVIDED EXTERNAL INTERFACES 4
2.2 USED EXTERNAL INTERFACES 4
3 USE CASES 5
3.1 INITIATING A FCH SESSION 5
3.2 ENDING A SESSION 22
4 STRUCTURE 38
4.1 RESPONSIBILITIES 38
4.2 INTERFACES 38
4/002 01-CAL 119 0401 Uen A

5 SOFTWARE UNITS 38
5.1 FCHEXESRC 38
5.2 FCHLIBSRC 40
6 PROCESSES 41
7 PERSISTENT STORAGE 41
8 ERROR HANDLING 42
DESIGN SPECIFICATION 2(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

9 FUNCTION CHANGE 42
10 START, STOP AND RESTART 42
11 CONFIGURATION 42
12 CAPACITY 43
12.1 DATA FOR CAPACITY ESTIMATION 43
12.2 CAPACITY ESTIMATION 43
13 SPECIAL FEATURES 43
14 REFERENCES 43
15 ANNEXES 44
15.1 ANNEX REVISION HISTORY 44
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 3(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

1 GENERAL

This document describes how to provide Function Change, i.e. online soft-
ware upgrade of AP software with minimum system downtime and main-
tained high availability.

The implemetation described will require a reboot of the system, and


provides transactional behavior, fallback capability and evaluation of an
upgraded system before committing.

The implementation is based on the Microsoft Cluster Server (MSCS) and


InstallShield third-party products, and also uses or prepares use of the CXC
product Backup and Restore (BUR).

1.1 SYSTEM ENVIRONMENT

MSCS InstallShield WIN32

Registry FCH BUR

AEH PHA PRC

Figure 1.1 The system environment.

1.2 REQUIREMENT REFERENCES

See ref.[2] for details on requirements.


4/002 01-CAL 119 0401 Uen A

1.3 DESIGN PRINCIPLES

The internal design of FCH is object oriented, whereas the command line
interfaces used to execute a Function Change session are procedural.
Everything is written in C++.
DESIGN SPECIFICATION 4(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

It is designed to be transactional, in the sense that it will always be able to


return the system to a known working state. Much effort is placed in
making it as robust as possible.

1.4 HARDWARE AND SYSTEM SOFTWARE

Force APG40 LBB, Microsoft Windows NT 4.0 Advanced Server.

2 EXTERNAL INTERFACES

2.1 PROVIDED EXTERNAL INTERFACES

FCH provides the following external interfaces:

2.1.1 fchstart

Command line interface to initiate and perform a FCH upgrade of the


system.

2.1.2 fchfb

Command line interface to perform a fallback to the system existing prior


to the FCH session.

2.1.3 fchcommit

Command line interface to transfer necessary data to the unmodified node


and prepare the system for commit of the new configuration.

2.1.4 fchend

Command line interface to install the new configuration on the passive


node of the system and end the FCH session. Also used to end a failed FCH
session.

2.1.5 fchrst

Command line interface to restore the uncommitted node, using BUR, in


case of a severe failure during a FCH session.

2.1.6 FCH pipe command interface.


4/002 01-CAL 119 0401 Uen A

Used by PRC to do an automatic fallback during FCH. PRC is sending the


command “fallback” to the pipe.

2.2 USED EXTERNAL INTERFACES

The FCH application uses the following external interfaces:

1 AP Event Report (AEH) API


DESIGN SPECIFICATION 5(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

2 AP Process Control (PRC) CLI


3 AP Parameter Handling (PHA) CLI
4 AP Backup and Restore (BUR) CLI. FCH, at present does
only prepare use of BUR CLI.
5 MSCS API
6 InstallShield CLI
7 Microsoft Win32 API

The interfaces are described in the following sections.

2.2.1 AP Event Report API

The AP Event Report API, ACS_AEH_EvReport, is used to send alarms


and events from FCH.

2.2.2 AP Process Control CLI

FCH uses the prcconf CLI to reconfigure the MSCS cluster database and
the prcgen CLI to generate the PRC Service Start Schedule file.

2.2.3 AP Parameter Handling CLI

FCH uses the phacreate CLI to syntax check parameter files and update the
PHA parameter database. FCH also uses the phatrans CLI to transfer
parameters between different versions of CXCs.

2.2.4 AP Backup and Restore CLI

At present, FCH does not use BUR CLI directly to restore a single node. In
case of a restore, FCH prepares the system for restore and the operator has
to invoke the needed BUR command manually. I.e. fcc_restore or Burre-
store.

2.2.5 MSCS API

FCH uses the MSCS API to control cluster resources and to perform
failovers.

2.2.6 InstallShield CLI

FCH uses the InstallShield CLIs setup.exe and isuninst.exe to perform


installation and removal of CXC software pacakges.
4/002 01-CAL 119 0401 Uen A

2.2.7 Microsoft Win32 API

The Win32 API is used extensively to implement the FCH functionality.


DESIGN SPECIFICATION 6(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3 USE CASES

3.1 INITIATING A FCH SESSION

Note that the different fchstart options described below can be combined in
any way.

3.1.1 Installing CXC packages

3.1.1.1 Description

Install and upgrade of CXC packages is initiated with the fchstart command
using the -d option and a directory as argument. The directory must contain
the CXC packages to be installed, in the form of self-extracting Winzip
files. The command will unpack the packages to a specific location, check
to see if the CXC packages are previously installed, and if so check the revi-
sion of the installed package and compare it with the revision of the
package to be installed. All unique CXC packages, i.e. the revisions that are
not already installed, will be displayed as a list from which the operator
may select which CXC’s to install.

FCH is a state machine, implemented in a transactional way, so when


fchstart begins, the FCH state is set to Installing.

All existing rin files with it’s corresponding instances are removed from the
system before CXC package updates. After installation, the rin files that
should exist in the new system are added again. There are 2 ways to add or
update rin-files: by using the -i option or installation using cxc installation.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state with state set to
noFCH.

If the operator selects y for yes, state is set to Reboot, the node will be
rebooted, after which the FCH server component will perform the switch
over to the new system with state initially set to Failover1 etc.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 7(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.1.2 Flow of Events

Operator fchstart FCHLIB


fchstart -d (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok

initiate pkg handling (e)


ok
get new pkg list (f)
ok
list packages (g)
enter selection (h)
install package (i) (loop)
ok
get new pkg list (j)
ok
add all rin files(k)
ok
rebuild config (l)
ok
verify before switch (m)
4/002 01-CAL 119 0401 Uen A

ok
prompt (n)
answer (o)
set state Reboot and
initiate reboot (p)
ok
ok
DESIGN SPECIFICATION 8(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

a operator executes fchstart -d <arg>, where <arg> is a


directory on the system disk where the CXC packages to
install are located.
b FCH verifies that the cluster server is running, that directory
given as argument exists, that no other FCH session is going
on, and that the FCH server is running. It also checks the
amount of free disk space It also makes backup copies of
FCH and ACS binaries used for execution of old binaries
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) and BUR is in progress, that the
cluster quorum resource is available, that the current node is
the passive node, that none of the nodes are paused, that all
cluster resources are online and that the FCH special
directories are empty. It also flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file. State is set to Installing.
e FCH initiates package handling, making sure that the FCH
package directories exist and initiating the FCH package
transaction log. FCH also builds a list of currently installed
packages, stops all resources on the current node and
decompresses the new packages. FCH also clears all resource
instances from the service control database together with the
corresponding resource instance files.
f FCH builds a list of the new packages.
g FCH prints the list of new packages to the operator and
prompts him to select packages to install.
h Operator enters selection.
i FCH updates the package install transaction log and installs
the selected package(s).
j FCH builds a new list of packages, containing the remaining
new packages. Steps g to j are then iterated until all packages
have been selected or the opertor selects to leave the install
menu.
k FCH adds all resource instances in the service control
database together with the corresponding resource instance
files.
l FCH rebuilds the PRC service start schedule file to be used
later on in the switch over to the new system and makes a
4/002 01-CAL 119 0401 Uen A

backup copy of the file.


m FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
n FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
DESIGN SPECIFICATION 9(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

o The operator confirms or denies. If the operator denies, the


FCH session is aborted.
p FCH sends an event specifying that this is a controlled FCH
reboot, sets the state to Reboot and initiates a reboot of the
system using the prcboot command and exits.

3.1.1.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted and all changes
made to the system will be revoked. FCH uses the state machine states to
revoke. Fallback is done, state is set to Reboot2 and system is rebooted to
activate old software. Examples of errors that cannot be handled are for
instance missing or failed system resources, missing or faulty input data
and node to node communication failure.

3.1.2 Deleting CXC packages

3.1.2.1 Description

Delete of CXC packages is initiated with the fchstart command using the -
r option. All installed CXC’s will be displayed in a list from which the oper-
ator may select CXC’s to delete from the system.

FCH is a state machine, implemented in a transactional way, so when


fchstart begins, the FCH state is set to Installing.

All existing rin files with it’s corresponding instances are removed from the
system before CXC package updates. After installation, the rin files that
should exist in the new system are added again. There are 2 ways to add or
update rin-files: by using the -i option or installation using cxc installation.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.

Note that this powerful option can really make a system unusable.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 10(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.2.2 Flow of Events

Operator fchstart FCHLIB


fchstart -r (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok

initiate pkg handling (e)


ok
get current pkg list (f)
ok
list packages (g)
enter selection (h)
remove package (i) (loop)
ok
get current pkg list (j)
ok
rebuild config (k)
ok
verify before switch (l)
ok
prompt (m)
4/002 01-CAL 119 0401 Uen A

answer (n)
initiate reboot (o)
ok
ok
DESIGN SPECIFICATION 11(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

a operator executes fchstart -r.


b FCH verifies that the cluster server is running, that no other
FCH session is going on, and that the FCH server is running.
It also checks the amount of free disk space. It also makes
backup copies of FCH and ACS binaries used for secondary
execution.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) is in progress, that the cluster quorum
resource is available, that the current node is the passive node,
that none of the nodes are paused, that all cluster resources are
online and that the FCH special directories are empty. It also
flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH initiates package handling, initiating the FCH package
transaction log. FCH also builds a list of currently installed
packages and stops all resources on the current node.FCH
also clears all resource instances from the service control
database together with the corresponding resource instance
files.
f FCH retrieves the list of the installed packages.
g FCH prints the list of installed packages to the operator and
prompts him to select packages to remove.
h Operator enters selection.
i FCH updates the package remove transaction log and
removes the selected package(s).
j FCH builds a new list of packages, containing the remaining
installed packages. Steps g to j are then iterated until all
packages have been selected or the opertor selects to leave the
remove menu.
k FCH rebuilds the PRC service start schedule file to be used
later on in the switch over to the new system and makes a
backup copy of the file.
l FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
m FCH prompts the operator to confirm that he wishes to switch
4/002 01-CAL 119 0401 Uen A

to the new system configuration.


n The operator confirms or denies. If the operator denies, the
FCH session is aborted.
o FCH sends an event specifying that this is a controlled FCH
reboot, sets Reboot state and initiates a reboot of the system
using the prcboot command and exits.
DESIGN SPECIFICATION 12(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.2.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted and all changes
made to the system will be revoked. FCH uses the state machine states to
revoke. Fallback is done, state is set to Reboot2 and system is rebooted to
activate old software. Examples of errors that cannot be handled are for
instance missing or failed system resources, missing or faulty input data
and node to node communication failure.

3.1.3 Editing CXC parameters

3.1.3.1 Description

Edit of CXC parameters is initiated with the fchstart command using the -
p option. All installed CXC’s with parameters will be displayed in a list
from which the operator may select CXC’s which parameters he wants to
edit. The parameter file of the selected CXC will be opened in a text editor
where the operator may perform edits of parameter values. The edited
parameter files are checked for syntax errors.

FCH is a state machine, implemented in a transactional way, so when


fchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 13(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.3.2 Flow of Events

Operator fchstart FCHLIB


fchstart -p (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok

initiate pkg handling (e)


ok
get current pkg list (f)
ok
list packages (g)
enter selection (h)
edit parameter file (i) (loop)
ok
syntax check par file (j)
ok
update pha database (k)
ok
verify before switch (l)
ok
prompt (m)
answer (n)
4/002 01-CAL 119 0401 Uen A

initiate reboot (o)


ok
ok
DESIGN SPECIFICATION 14(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

a operator executes fchstart -p.


b FCH verifies that the cluster server is running, that no other
FCH session is going on, and that the FCH server is running.
It also checks the amount of free disk space. It also makes
backup copies of FCH and ACS binaries which is used for
secondary execution.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) and BUR is in progress, that the
cluster quorum resource is available, that the current node is
the passive node, that none of the nodes are paused, that all
cluster resources are online and that the FCH special
directories are empty. It also flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH initiates package handling, initiating the FCH package
transaction log.
f FCH builds a list of all installed packages that have PHA
parameters.
g FCH prints the list of installed packages with PHA
parameters to the operator and prompts him to select
packages to edit.
h Operator enters selection.
i The parameter file of the selected package is backed up and
opened in a text editor, the operator makes his changes, saves
the file and exits the editor.
j FCH syntax checks the edited file using the phacreate
command. If the file contains syntax errors the operator is
prompted to re-edit the file or abort the session. Steps g to j
are iterated until the operator selects to leave the parameter
edit menu.
k FCH updates the PHA database with the edited files using the
phacreate command.
l FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
m FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
4/002 01-CAL 119 0401 Uen A

n The operator confirms or denies. If the operator denies, the


FCH session is aborted.
o FCH sends an event specifying that this is a controlled FCH
reboot, sets Reboot state and initiates a reboot of the system
using the prcboot command and exits.
DESIGN SPECIFICATION 15(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.3.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted and all changes
made to the system will be revoked. FCH uses the state machine states to
revoke. Fallback is done, state is set to Reboot2 and system is rebooted to
activate old software. Examples of errors that cannot be handled are for
instance missing or failed system resources, missing or faulty input data
and node to node communication failure.

3.1.4 Update of CXC parameters with offline editing

3.1.4.1 Description

Update of CXC parameters is initiated with the fchstart command using the
-P option with a file as an argument. A The file must contain a full-path list
of files to replace the existing CXC13NNNN.par files. The original files are
backed up and then replaced with the new files.

The updated parameter files are checked for syntax errors.

FCH is a state machine, implemented in a transactional way, so when


fchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.

If the operator selects y for yes, state is set to Reboot and the node will be
rebooted, after which the FCH server component will perform the switch
over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 16(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.4.2 Flow of Events

Operator fchstart FCHLIB


fchstart -P <file> (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok

initiate pkg handling (e)


ok
get current pkg list (f)
ok

build new list (g)


ok

update parameter files (h)


ok
syntax check par files (i)
ok
update pha database (j)
ok
prepare local fallback (k)
ok

verify before switch (l)


ok
prompt (m)
4/002 01-CAL 119 0401 Uen A

answer (n)
initiate reboot (o)
ok
ok
DESIGN SPECIFICATION 17(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

a operator executes fchstart -P <file> where <file> is a list of


new CXCNNNNN.par parameter files. FCH checks <file>
for: existance, existance of listed files and syntax errors of
listed files.
b FCH verifies that the cluster server is running, that no other
FCH session is going on, and that the FCH server is running.
It also checks the amount of free disk space. It also makes
backup copies of FCH and ACS binaries which is used for
secondary execution.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) and BUR is in progress, that the
cluster quorum resource is available, that the current node is
the passive node, that none of the nodes are paused, that all
cluster resources are online and that the FCH special
directories are empty. It also flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH initiates package handling, initiating the FCH package
transaction log.
f FCH builds a list from all installed packages that have PHA
parameters.
g FCH builds a list from the new parameter files listed in <list>.
h The old parameter files of the new list are backed up and new
ones inserted instread.
i FCH syntax checks the new files using the phacreate
command.
j FCH updates the PHA database with the edited files using the
phacreate command.
k The parameter files in the FCH system directory are saved on
the other node to be able to do a local fallback on that node.
l FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
m FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
n The operator confirms or denies. If the operator denies, the
FCH session is aborted.
4/002 01-CAL 119 0401 Uen A

o FCH sends an event specifying that this is a controlled FCH


reboot, sets Reboot state and initiates a reboot of the system
using the prcboot command and exits.
DESIGN SPECIFICATION 18(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.4.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted and all changes
made to the system will be revoked. FCH uses the state machine states to
revoke. Fallback is done, state is set to Reboot2 and system is rebooted to
activate old software. Examples of errors that cannot be handled are for
instance missing or failed system resources, missing or faulty input data
and node to node communication failure.

3.1.5 Replacing LBB files

3.1.5.1 Description

Replacing of LBB file, i.e. any file in the system, is initiated with the
fchstart command using the -l option and a file as argument. The file must
contain a list of files to replace and the files to replace them, each file pair
separated by a semi-colon. If the file to replace does not exist, it is assumed
that it’s a new file. The original files are backed up and then replaced with
the new files. If the original file didn’t exist, an empty file is created as
backup. The empty file will then later be removed should fallback have
happened.

FCH is a state machine, implemented in a transactional way, so when


fchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 19(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.5.2 Flow of Events

Operator fchstart FCHLIB


fchstart -l <arg> (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok

initiate pkg handling (e)


ok
replace LBB files (f)
ok
verify before switch (g)
ok
prompt (h)
answer (i)
initiate reboot (j)
ok
ok

a operator executes fchstart -l <arg>, where <arg> is a file


containing the list of files to be replaced and the files to
replace them.
b FCH verifies that the cluster server is running, that the file
given as argument exists, that no other FCH session is going
on, and that the FCH server is running. It also checks the
4/002 01-CAL 119 0401 Uen A

amount of free disk space and that vaild BUR backup files
exist. It also makes backup copies of FCH and ACS binaries.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) is in progress, that the cluster quorum
resource is available, that the current node is the passive node,
DESIGN SPECIFICATION 20(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

that none of the nodes are paused, that all cluster resources are
online and that the FCH special directories are empty. It also
flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH initiates package handling, initiating the FCH package
transaction log.
f FCH makes backup copies of the files to be replaced, and then
replaces them.
g FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
h FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
i The operator confirms or denies. If the operator denies, the
FCH session is aborted.
j FCH sends an event specifying that this is a controlled FCH
reboot, initiates a reboot of the system using the prcboot
command and exits.

3.1.5.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted and all changes
made to the system will be revoked. FCH uses the state machine states to
revoke. Fallback is done, state is set to Reboot2 and system is rebooted to
activate old software. Examples of errors that cannot be handled are for
instance missing or failed system resources, missing or faulty input data
and node to node communication failure.

3.1.6 Adding and replacing resource instance files

3.1.6.1 Description

Some products, such as FOS, has a need to create several instances of a


cluster resource service. These instances are called resource instances. The
resource instances are described in resource instance files and are comple-
4/002 01-CAL 119 0401 Uen A

ments to the original service described in the ACS_PRC_Config file

Adding or replacing of a resource instance file, a so called .rin file can be


seen as a special case of adding or replacing of a LBB file. The same logic
is used with some extra handling. Adding and/or replacing of a .rin file is
initiated with the fchstart command using the -i option and a file as argu-
ment. The file must contain a list of resource instance files to replace and/or
add and the files to replace them, each file pair separated by a semi-colon.
DESIGN SPECIFICATION 21(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

If the file to replace does not exist, it is assumed a new file. The original
files are backed up and then replaced with the new files. If the original file
didn’t exist, an empty file is created as backup. The empty file will then
later be removed should fallback have happened.

All existing rin files with it’s corresponding instances are removed from the
system before CXC package updates. After installation, the rin files that
should exist in the new system are added again. There are 2 ways to add or
update rin-files: by using the -i option or installation using cxc installation.

FCH is a state machine, implemented in a transactional way, so when


fchstart begins, the FCH state is set to Installing.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node will be restored to its previous state.

If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 22(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.6.2 Flow of Events

Operator fchstart FCHLIB


fchstart -l <arg> (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok
replace LBB files (f)
ok
exec prcgen & prcconf
ok
verify before switch (g)
ok
prompt (h)
answer (i)
initiate reboot (j)
ok
ok

a operator executes fchstart -i <arg>, where <arg> is a file


containing the list of files to be replaced and the files to
replace them.
b FCH verifies that the cluster server is running, that the file
given as argument exists, that no other FCH session is going
on, and that the FCH server is running. It also checks the
4/002 01-CAL 119 0401 Uen A

amount of free disk space. It also makes backup copies of


FCH and ACS binaries.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) is in progress, that the cluster quorum
resource is available, that the current node is the passive node,
DESIGN SPECIFICATION 23(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

that none of the nodes are paused, that all cluster resources are
online and that the FCH special directories are empty. It also
flushes the cache of all disks.
d FCH initiates the new session by raising the alarm AP
FUNCTION CHANGE IN PROGRESS. It also pauses the
current node and makes a backup copy of the PRC service
start schedule file.
e FCH clears all resource instances from the service control
database together with the corresponding resource instance
files. FCH makes backup copies of the files to be replaced,
and then replaces them.
f prcgen is executed to create the PRC_Config file with the
updated resource instances. prcconf is run to create the new
cluster database.
g FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
h FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
i The operator confirms or denies. If the operator denies, the
FCH session is aborted.
j FCH sends an event specifying that this is a controlled FCH
reboot, initiates a reboot of the system using the prcboot
command and exits.

3.1.6.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted and all changes
made to the system will be revoked. FCH uses the state machine states to
revoke. Fallback is done, state is set to Reboot2 and system is rebooted to
activate old software. Examples of errors that cannot be handled are for
instance missing or failed system resources, missing or faulty input data
and node to node communication failure.

3.1.7

3.1.8 Upgrading LBB and 3pp software


4/002 01-CAL 119 0401 Uen A

3.1.8.1 Description

Upgrade of LBB and 3pp software is initiated with the fchstart command
using the -L option. After initial checking a LBBShell> command prompt
is displayed, and the operator may enter commands and execute upgrade
packages. Reboots are also permitted during this phase.
DESIGN SPECIFICATION 24(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

FCH is a state machine, implemented in a transactional way, so when


fchstart -L begins, the FCH state is set to LBBReboot. This state is kept as
long as the operator chooses to install and reboot. When all reboots are done
and the operator types “l”, the state is set to Installing.

After all updates have been performed, the operator is prompted to switch
to the new system. If the operator selects n for no, the FCH session will be
aborted and the node must be restored using the fchrst component.

If the operator selects y for yes, the node will be rebooted, after which the
FCH server component will perform the switch over to the new system.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 25(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.8.2 Flow of Events

Operator fchstart FCHLIB


fchstart -L (a)
initial check (b)

ok
check resources (c)
ok
initiate session (d)
ok

initiate pkg handling (e)


ok
LBBShell prompt (f)
enter command (g) (loop)

continue session (h)


ok
verify before switch (i)
ok
prompt (j)
answer (k)
initiate reboot (l)
ok
ok

a operator executes fchstart -L.


b FCH verifies that the cluster server is running, checks if a
4/002 01-CAL 119 0401 Uen A

LBB upgrade reboot has been done, that no other FCH


session is going on, and that the FCH server is running. It also
checks the amount of free disk space and that vaild BUR
backup files exist. It also makes backup copies of FCH and
ACS binaries.
c FCH verifies all necessary system resources. It checks the
connection to the other node and the data disk, that no Soft
Function Change (SFC) is in progress, that the cluster quorum
DESIGN SPECIFICATION 26(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

resource is available, that the current node is the passive node,


that none of the nodes are paused, that all cluster resources are
online and that the FCH special directories are empty. It also
flushes the cache of all disks.
d FCH initiates the new session by setting a registry value
indicating that this is a LBB upgrade session, raising the
alarm AP FUNCTION CHANGE IN PROGRESS. It also
pauses the current node and makes a backup copy of the PRC
service start schedule file.
e FCH initiates package handling and the FCH package
transaction log, stops all cluster resources on the current node
and sets the FCH state to LBBReboot1.
f FCH displays the LBBShell> prompt to the operator.
g Operator enters a DOS command. Steps f and g are iterated
until the operator has performed all updates he wishes to do.
h FCH sets the state to Installing and continues the FCH
session. Installation and removal of CXC’s, parameter edits
etc can be performed here, just as in a normal FCH session..
i FCH once again verifies the system resources. It checks that
the quorum resource is available, that the current node is the
passive node, that the current node is paused and that the
other node is not paused.
j FCH prompts the operator to confirm that he wishes to switch
to the new system configuration.
k The operator confirms or denies. If the operator denies, the
FCH session is aborted.
l FCH sends an event specifying that this is a controlled FCH
reboot, initiates a reboot of the system using the prcboot
command and exits.

3.1.8.3 Exceptional Flow of Events

If the operator selects to abort the session at any prompt, or if an error


occurs that FCH cannot handle, the session will be aborted, but different
from other use cases the node will not be automatically be restored. The
fchrst component must be used to acheive this. Examples of errors that
cannot be handled are for instance missing or failed system resources,
missing or faulty input data and node to node communication failure.

3.1.9 Switch over


4/002 01-CAL 119 0401 Uen A

3.1.9.1 Description

Switching over to the new system, i.e. making the upgraded node the active
node to test the new configuration is handled by the FCH server component
after fchstart has performed a reboot. As the server component starts up
after a reboot, it checks if this was a reboot after a fchstart (Reboot state) ,
and if so proceeds to send events to both nodes about the successful FCH
DESIGN SPECIFICATION 27(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

reboot. FCH sends events about switchover attempt and goes into sleep for
(default) 5 minutes. FCH checks if cluster configuration change is necces-
sary and sets Failover1 state. All cluster resources are stopped, both nodes
are resumed and Move1 state is set. Common resource groups are moved
to the modified node. If configuration change is neccessary, Config1 state
is set and current configuration is deleted, state is set Config1B and new
configuration is added. This node is resumed and other node paused. Super-
vision state is set and FCH starts all the resources on upgraded node.
Current node is paused and other resumed. Other node resource are started.
Then current node is resumed and other paused. Finally sends an event
saying that switchover was successful and that supervision has now begun.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 28(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.9.2 Flow of Events

FCH Supervisor FCH Server FCHLIB

initiate supervisor (a)

handle orphan resources (b)

report boot (c)

initiate switch over (d)


report switch attempt (e)

check PRC config (f)

stop cluster (g)

fail over (h)

edit cluster db (i)

prepare cluster start (j)

start current node (k)

start other node (l)

report switch (m)


4/002 01-CAL 119 0401 Uen A

a The supervisor thread is started at boot. It starts by checking


that it can access the cluster, checking for cyclic reboots,
checking and setting timer values , sets a mutex to prevent
other FCH processes from being executed and the FCH state
DESIGN SPECIFICATION 29(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

and if it is executing on the passive or active node. In this use


case its on the passive node and the FCH state is Reboot, i.e.
the node is coming back up after a fchstart boot.
b the supervisor checks for orphan resources, i.e. cluster
resource groups that have no owner node and assigns them to
the correct node.
c the supervisor sends an event about the successful fchstart
reboot and creates a reboot file to prevent PRC from counting
this reboot as a spontaneous reboot.
d the supervisor calls the switchover function and state is set to
Failover1.
e switchover sends an event reporting that a switch attempt is
in progress.
f switchover checks if the PRC service start schedule has been
modified, and sets a boolean if it has to indicate that the
cluster database needs to be updated (done in step i). In this
use case the file has been changed.
g switchover stops all cluster resources. Any resource that fails
to come offline is subsequently killed.
h the current node is resumed, the FCH state set to Move1 and
the resource groups are moved to the current node, making it
the active node.
i switchover sends an event reporting that it’s about the edit the
cluster database. The old configuration is then removed and
the new configuration inserted. The FCH state is set to
Config1B and an event is sent reporting that the cluster
database was sucessfully edited.
j the FCH state is set to Failover2, the PRC reboot log file is
updated to indicate that the last reboot was a FCH reboot. The
current node is then paused.
k the current (now active) node is resumed (and the other
paused), the FCH state set to Supervision, and the cluster
resources on the current node are started. FCH first waits for
PRC cluster control to come online, and then verifies that all
other resources also have come online.
l the current node is now paused and the other resuemed. The
cluster resources on the other (now passive) node are started
and the switchover function waits for them to come online.
After this the other node is paused and this one resumed.
m the supervisor moves the alarm AP FUNCTION CHANGE
IN PROGRESS to the new active node by ceasing it on the
4/002 01-CAL 119 0401 Uen A

other node and raising it again on the current node. Finally, it


sends an event reporting that the FCH switch over was
successful and that the system is now in supervision mode.
DESIGN SPECIFICATION 30(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.1.9.3 Exceptional Flow of Events

Non-stopping errors are logged to the FCH log file. Stopping errors are
logged and an event will be sent reporting what error occurred. Examples
of stopping errors is resources failing to come online during switch over
and failed cluster database updates.

3.2 ENDING A SESSION

3.2.1 Commit new system

3.2.1.1 Description

The commit command’s main purpose is to establish that the new AP is


committed and that there is no way back. It also copies the data needed to
upgrade the old node. It can be called in various situations: to commit and
copy again after the old node has been restored, to copy and commit after
an interrupted fchcommit command.

Committing the new system configuration is initiated with the fchcommit


command during the supervision period (i.e. after a successful switch over).
This will end the supervision period and copy all necessary data, such as
CXC packages and parameter files, to the other node. LBB and 3pp soft-
ware packages will not be copied however. These must be transferred to the
other node by the operator.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 31(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.1.2 Flow of Events

Operator fchcommit FCHLIB


fchcommit (a)
initial check (b)
ok
verify before commit (c)
ok
backup system binaries (d)
ok
commit packages (e)
ok
commit cluster config (f)
ok
commit LBB files (g)
ok
commit parameters (h)
ok
finish commit (i)
ok
ok

a operator executes fchcommit.


b FCH verifies the the cluster server is running, that the FCH
state is Supervision and that no other FCH processes are
running.
c FCH verifies that the connections to the other node and the
4/002 01-CAL 119 0401 Uen A

data disk is OK, that the quorum resource is available, that the
current node is the active node, that the current node is not
paused and that the other node is paused.
d FCH makes backup copies of the FCH and ACS binaries to
the other node. State is set to Committing.
DESIGN SPECIFICATION 32(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

e FCH commits the CXC packages by copying new packages


to the other node and updating the FCH transaction log on the
other node to prepare for installation and removal during
fchend.
f The backup of the old PRC service start schedule file is
deleted.
g all edited LBB files are copied to the other node for later
installation during fchend.
h all edited parameter files are copied to the other node for later
update of the PHA database during fchend.
i the FCH state is set to CommitDone to indicate that the
commit was successfully ended.

3.2.1.3 Exceptional Flow of Events

If the fchcommit command fails or is interrupted, it will simply exit with an


error message. The command may then be executed again to finish the
commit. Failures during commit are normally node to node communication
errors, preventing the command from copying files.

3.2.2 Fall back to previous system

3.2.2.1 Description

Fall back to the system that existed before the FCH session was started is
initiated with the fchfb command. This will raise the FCH failed alarm, stop
all resources, restore the cluster configuration, move the resource groups
back to the unmodified node and start the resources again, in essence
performing a reversed switch over.

After this, if no LBB/3pp software was upgraded, the command undoing all
changes made during the fchstart command, deleting new CXC’s, rein-
stalling old CXC and restoring modified files. Finally the node is rebooted.

If LBB/3pp software was upgraded, the command only restores the cluster
configuration and performs the reversed switchover. After this the operator
is prompted to execute fchrst, and the command exits.

After the fall back has been performed (or, in the case of LBB/3pp upgrade,
the restore) the session must be ended using the fchend component.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 33(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.2.2 Flow of Events


4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 34(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

Operator fchcommit FCHLIB


fchfb (a)
initial check (b)
ok
verify berfore fallback (c)
ok
prompt for fallback (d)
answer (e)
initiate fallback (f)
ok
prepare for switch (g)
ok
delete cluster config (h)
ok
fail over (i)
ok
insert cluster config (j)
ok
start other node (k)
ok
remove instances, rin-files(l)
ok
fall back LBB files (m)
ok
fall back packages (n)
ok
add old instances, rin-files(o)
ok
fallback CXC parameters (p)
ok
Added LBB files removed (q)
4/002 01-CAL 119 0401 Uen A

ok
Add node cluster config (r)
ok
finish fallback & reboot (s)
ok ok
DESIGN SPECIFICATION 35(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

a operator executes fchfb


b FCH verifies that the cluster server is running, that the FCH
server is running on both nodes, checks that no other fall back
or FCH process is running and sets a mutex to prevent other
FCH processes from being started and checks that the FCH
state is Supervision.
c FCH checks that the quorum resource is availble, that the
current node is the active node, that the FCH state is
Supervision, that the current node is not paused and that the
other node is paused.
d FCH prompts the operator to perform fall back.
e operator confirms or denies. If operator denies the fall back is
aborted.
f FbFailover1 state is set. The cluster resources are
stopped.Any resource that fails to come offline is
subsequently killed. FCH checks if the PRC service start
schedule file has changed and sets a boolean to indicate if the
cluster configuration is to updated later on. It also ceases the
alarm AP FUNCTION CHANGE IN PROGRESS and raises
the alarm AP FUNCTION CHANGE FAILED.
g FCH copies the old PRC service start schedule file to the
other node and sends an event to report that a switch attempt
is in progress.
h the FCH state is set to Config3, the new cluster configuration
is deleted and the FCH state then set to Move2.
i Both nodes are resumed and FCH moves all cluster resource
groups to the other node. Upgraded node is paused again.
j the FCH state is set to Config4, the old cluster configuration
is inserted on the old other node, the FCH state is set to
FbFailover2, the PRC reboot log file is updated to indicate
that the comming reboot is a FCH reboot. The current node is
then paused.
k the cluster resources on the old other node are taken online.
FCH waits for all resources to come online.
l All instances of services corresponding to existing rin files
are removed
m any replaced LBB files are restored.
n CXC packages that were changed (installed, updated,
removed) are restored.
o the old instances of services with it’s corresponding rin-files
are restored
4/002 01-CAL 119 0401 Uen A

p changed CXC parameters are restored.


q any added LBB files are removed.
r the FCH state is set to Config5, an event is sent reporting that
an attempt to edit the cluster database is in progress, the
cluster configuration for the current node is deleted and the
old configuration is reinserted.
DESIGN SPECIFICATION 36(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

s the FCH state is set to FbFailover3, an event reporting that the


fallback was successful is sent and the old PRC service start
schedule files on both nodes are deleted. The FCH state is
then set to FbReboot, an event is sent to report that a reboot
after fall back is pending and the node is rebooted.

3.2.2.3 Exceptional Flow of Events

If the fchfb command fails, the fchrst command must be used to restore the
node. In extreme cases, where even the fchrst does not work, a full BUR
emergency restore of both nodes may be necessary.

3.2.3 End successful FCH session

3.2.3.1 Description

Ending of a successful FCH session, i.e. after a successful fchcommit was


executed, is initiated with the fchend command. The command will
upgrade the unmodified node to the same state as the upgraded node.
Removed CXC’s will be deleted, new and upgraded CXC’s installed and
edited files updated.

State is set to EndInstalling when upgrade starts.

After this, the command reboots the node and the supervisor component
ends the session after the reboot.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 37(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.3.2 Flow of Events

Operator fchend FCHLIB


fchend (a)
initial check (b)
ok
initiate end session (c)
ok
prepare upgrade (d)
ok
rm rinf. commit LBB files(e)
ok
commit packages(f)
ok
commit parameters + rin(g)
ok
change cluster config (h)
ok
end session (i)
ok
ok

a operator executes fchend.


b FCH verifies that the cluster server is running, that the FCH
state is CommitDone, that no other FCH processes are
running and that the FCH server is running on both nodes. It
also checks the amount of available diskspace and makes
backup copies of the FCH binaries.
c FCH verifies that the current node is the passive node, that the
current node is paused and that the other node is not paused.
It also rebuilds the PRC service start schedule file.
d FCH sends an event reporting that a swich attempt is in
progress, stops all cluster resources on the current node and
sets the FCH state to EndInstalling.
4/002 01-CAL 119 0401 Uen A

e FCH removes the current instances based on the rin files and
removes the rinfiles themselves. FCH replaces all LBB files
that were changed during fchstart on the other node.
f FCH installs, upgrades and removes the CXC packages that
were changed during fchstart, making the current node
software configuration identical to the other node.
Setupservices is run to setup random users for services.
DESIGN SPECIFICATION 38(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

g FCH updates the CXC parameter files and the PHA database
to the same status as on the other node. All instances basedon
current rinfiles are added.
h FCH rebuilds the PRC service start schedule file and checks
if the file has changed. In this use case it has changed. FCH
sets the FCH state to Config2, sends an event reporting that
the cluster database is about to be edited, deletes the old
cluster configuration, sends another edit event, sets the FCH
state to Config2B, adds the new cluster configuration and sets
the FCH state to EndInstallDone.
i FCH sends an event reporting that the switch attempt was
successfull, sets the FCH state to EndReboot, sends an event
reporting that a reboot is about to take place and reboots the
node.

3.2.3.3 Exceptional Flow of Events

If the fchend command fails, the fchrst command may be used to restore the
node. The fchcommit command may then be executed again, and then the
fchend command can be executed once more to upgrade the unmodified
node.

3.2.4 End a successful FCH session with LBB/3pp software upgrade

3.2.4.1 Description

Ending of a successful FCH session where LBB/3pp software was


upgraded is initiated with the fchend command. The command will display
the LBBShell> prompt so that the operator may perform the same upgrades
on this node as well. Just as during the fchstart command, reboots are
permitted at the LBBShell> prompt, and the operator may continue the
session after the reboot by executing the command again.

When LBB upgrade starts, state is set to LBBReboot2.

After LBB upgrades have been finished, any other changes made during the
FCH session, i.e. CXC install, delete or parameter changes, will also be
performed. After this, the command reboots the node and the supervisor
component ends the session after the reboot.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 39(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.4.2 Flow of Events

Operator fchend FCHLIB


fchend (a)
initial check (b)
ok
initiate end session (c)
ok
prepare upgrade (d)
ok
LBBShell prompt (e)
command (f) (loop)
commit packages (g)
ok
commit LBB files (h)
ok
commit parameters (i)
ok
change cluster config (j)
ok
end session (k)
ok
ok

a operator executes fchend.


b FCH verifies that the cluster server is running, checks if LBB
software is to be upgraded (in this use case it is), that the FCH
state is CommitDone, that no other FCH processes are
running and that the FCH server is running on both nodes. It
also checks the amount of available diskspace and makes
backup copies of the FCH binaries.
c FCH verifies that the current node is the passive node, that the
current node is paused and that the other node is not paused.
4/002 01-CAL 119 0401 Uen A

It also rebuilds the PRC service start schedule file.


d FCH sends an event reporting that a swich attempt is in
progress, stops all cluster resources on the current node and
sets the FCH state to LBBReboot2.
e FCH displays the LBBShell> prompt to the operator.
f the operator enters a DOS command. The steps e and f are
interated until the operator selects to leave the menu or abort
the session.
DESIGN SPECIFICATION 40(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

g FCH installs, upgrades and removes the CXC packages that


were changed during fchstart, making the current node
software configuration identical to the other node.
h FCH replaces all LBB files that were changed during fchstart
on the other node.
i FCH updates the CXC parameter files and the PHA database
to the same status as on the other node.
j FCH rebuilds the PRC service start schedule file and checks
if the file has changed. In this use case it has changed. FCH
sets the FCH state to Config2, sends an event reporting that
the cluster database is about to be edited, deletes the old
cluster configuration, sends another edit event, sets the FCH
state to Config2B, adds the new cluster configuration and sets
the FCH state to EndInstallDone.
k FCH sends an event reporting that the switch attempt was
successfull, sets the FCH state to EndReboot, sends an event
reporting that a reboot is about to take place and reboots the
node.

3.2.4.3 Exceptional Flow of Events

If the fchend command fails or the operator aborts the session, the fchrst
command may be used to restore the node. The fchcommit command may
then be executed again, and then the fchend command can be executed once
more to upgrade the unmodified node.

3.2.5 End failed FCH session after fall back

3.2.5.1 Description

If the session was ended with fchfb, the command performs a last clean up
and ends the session, thereby enabling a new session to be initited with the
fchstart command.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 41(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.5.2 Flow of Events

Operator fchend FCHLIB


fchend (a)
initial check (b)
ok
initiate end session (c)
ok
resume node (d)
ok
end session (e)
ok
clean up (f)
ok
ok

a operator executes fchend.


b FCH verifies that the cluster server is running, that the FCH
state is End, that no other FCH process is executing and that
the FCH server is running on both nodes. It also makes a
backup of the FCH binaries.
c FCH verifies that the current node is the passive node.
d if the current node is paused, FCH resumes it. If the cluster
resource groups are stopped FCH starts them and verifies that
they come online.
e FCH ceases the alarm AP FUNCTION CHANGE FAILED,
resets the LBB upgrade keys in the registry and sets the FCH
state to noFCH.
f FCH deletes the old PRC service start schedule files on both
nodes and cleans up the FCH temporary directories.

3.2.5.3 Exceptional Flow of Events

Since this use case involves very few and quite simple operations, errors
rarely occurs. Should the command fail or be interrupted it can be executed
again.
4/002 01-CAL 119 0401 Uen A

3.2.6 Fall back using FCH restore


DESIGN SPECIFICATION 42(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.6.1 Description

FCH restore is initiated with the command fchrst. The backup image must
reside in the partitions for both nodes. The command first restores the
cluster configuration to its previous state and then proceeds to make the
unmodified node acticve.

Before the state CommitDone, the newly upgraded node will be restored, if
neccessary. From the state CommitDone and later, the old node will be
restored if neccessary.

After this, the command prints a message on the screen on how to proceed
with the restore using BUR, and then exits. Operator uses BUR to restore
the node from the backup image, and then reboots the node.

After the boot the fchend command must be used to end the session.

In this particaular use case a fchstart with LBB software upgrade has been
performed sucessfully, but the operator wants to fall back anyway. The
FCH state is Supervision and the operator initiates the command from the
passive (unmodified) node.

Note that the reason for that fchrst must start from a certain node lies in it’s
adaption to the old BUR. In the future it should be possible to start fchrst
from any node.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 43(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.6.2 Flow of Events

Operator fchrst FCHLIB


fchrst(a)
initial check (b)
ok
prompt(c)
ok
handle supervision (d)
ok
handle alarms (e)
ok
stop cluster (f)
ok
check cluster config (g)
ok
handle switch over (h)
ok
check cluster config (i)
ok
start current node (j)
ok
initiate restore (k)
ok
ok

a the operator executes fchrst from the passive node.


b FCH verifies that the cluster server is running, and that no
other FCH processes are running. If another FCH process is
detected the operator is prompted whether to continue
anyway. FCH also checks the FCH state and verifies that the
command is being executed on the correct node (this depends
on which FCH state we are in).
4/002 01-CAL 119 0401 Uen A

c The operator is also prompted if he is sure he wants to


continue with the operation.
d the FCH state is set to FbFailover1.
e the FCH alarms are moved to the current node.
f FCH stops all cluster resources and verify that they all come
offline.
DESIGN SPECIFICATION 44(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

g FCH checks if the PRC service start schedule file has


changed. In this use case it has not changed so the cluster
configuration does not need to be updated. The FCH state is
set to Move2.
h FCH resumes both nodes, moves the cluster resource groups
to the current node and pauses the other node.
i FCH again checks the PRC service schedule file for changes
and then sets the FCH state to FbFailover2.
j FCH verifies that the other node is paused, and if it is not it
pauses it. FCH start the cluster resources on the current node
and verifies that they all come online.
k FCH sets the FCH state to Restore and lets the user restore the
other node.

3.2.6.3 Exceptional Flow of Events

Should the fchrst command fail, a BUR emergency restore of both nodes
will be necessary.

3.2.7 End failed FCH session after restore

3.2.7.1 Description

If the session was ended with a FCH restore, the fchend command makes
sure the cluster configuration is correct, does a final clean up and ends the
session, thereby enabling a new session to be started.

The state is Restore when fchend starts.


4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 45(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.7.2 Flow of Events

Operator fchend FCHLIB


fchend (a)
initial check (b)
ok
initiate end session (c)
ok
check cluster config (d)
ok
resume node (e)
ok
end session (f)
ok
clean up (g)
ok
ok

a operator executes fchend.


b FCH verifies that the cluster server is running, that the FCH
state is End, that no other FCH process is executing and that
the FCH server is running on both nodes. It also makes a
backup of the FCH binaries.
c FCH verifies that the current node is the passive node.
d FCH copies the PRC service start schedule file from the other
node, checks if they differ with the PRC service start schedule
file on the current node. In this use case they differ. FCH then
stops all cluster resources on the current node, sends an event
reporting that the cluster database is about to be edited,
deletes the old cluster configuration, sends another edit event
and adds the new cluster configuration.
e if the current node is paused, FCH resumes it. If the cluster
resource groups are stopped FCH starts them and verifies that
they come online.
f FCH ceases the alarm AP FUNCTION CHANGE FAILED,
4/002 01-CAL 119 0401 Uen A

resets the LBB upgrade keys in the registry and sets the FCH
state to noFCH.
g FCH deletes the old PRC service start schedule files on both
nodes and cleans up the FCH temporary directories.
DESIGN SPECIFICATION 46(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.7.3 Exceptional Flow of Events

This is similar to the case of ending a session after fall back, but there is a
critical point if the PRC service start schedule file has changed. If the
cluster configuration edit fails there is a risk of single point of failure where
you might be forced to do a BUR Restore.

3.2.8 Re-commit (re-install) new system using FCH restore

3.2.8.1 Description

If fchend should fail when committing a new system configuration, the


FCH restore component can be used to restore the uncommitted node, i.e.
the node where fchend failed. The fchrst command is initiated from the
active, committed node and it uses BUR to restore the uncommitted node.

When the node has been restored, the fchcommit command can be executed
again as normally followed by the fchend command to re-commit the new
system.

In this use case the fchend command was interrupted in the middle of
installing CXC packages, the FCH state is EndInstalling. The operator
executes fchrst from the active node with the backup file for the passive
node as argument.

Note that fchrst is adapted to the old BUR and in the future it should be
possble to run from both nodes, not just one.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 47(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

3.2.8.2 Flow of Events

Operator fchrst FCHLIB fchcommit fchend


fchrst (a)
initial check (b)
ok
prompt operator(c)
ok
handle EndInstalling (d)
ok
initiate restore (e)
ok
ok
fchcommit (f)
ok
fchend (g)
ok

a the operator executes fchrst on the active node.


b FCH verifies that the cluster server is running, and that no
other FCH processes are running. If another FCH process is
detected the operator is prompted whether to continue
anyway. FCH also checks the FCH state and verifies that the
command is being executed on the correct node (this depends
on which FCH state we are in).
c The operator is prompted if he is sure he wants to continue
with the operation.
d the FCH state is set to Restore2.
e FCH sets the FCH state to Restore2 and lets the operator
execute BUR to restore the other old node.
f after the restore is finished, the operator executes fchcommit.
See chapter 3.2.1.
g after the fchcommit, the operator executes fchend to finish the
session. See chapter 3.2.3.
4/002 01-CAL 119 0401 Uen A

3.2.8.3 Exceptional Flow of Events

Should the fchrst command fail, a BUR emergency restore of both nodes
will be necessary.
DESIGN SPECIFICATION 48(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

4 STRUCTURE

The block FCH consists of three software units, two CAA and one CXC, as
shown below.

FCH
CNZ 222 59

FCHEXESRC FCHLIBSRC FCHBIN


CAA 109 0261 CAA 109 0262 CXC 137 413

CNZ 222 59 FCH SW Product


CAA 109 0261 FCH executables source container
CAA 109 0262 FCH library source container
CXC 137 413 FCH binaries container

4.1 RESPONSIBILITIES

4.1.1 FCHEXESRC

This software unit implements the FCH user interface commands, internal
commands and supervisor component.

4.1.2 FCHLIBSRC

This software unit implements the FCH core functionality Dynamic Link
Library (DLL) used by the FCHEXESRC components.

4.2 INTERFACES

FCH has no external interface.


4/002 01-CAL 119 0401 Uen A

5 SOFTWARE UNITS

5.1 FCHEXESRC

5.1.1 Components

The FCH executables consist of seven components: ACS_FCH_Server,


fchcommit, fchend, fchevent, fchfb, fchrst and fchstart.
DESIGN SPECIFICATION 49(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

5.1.1.1 ACS_FCH_Server

This component is implemented as a Windows NT Service and contain two


sub components:

1 FCH remote execute server - a named pipe to allow FCH


components to execute commands on the other node. The
following commands can be executed remotely:

a sync - flush file system cache to disk for a specified volume.


b prcconf - change cluster configuration.
c fchevent - send event or alarm.
d kill - terminate a process.
e prcboot - reboot the node.
f test - test the communication one pipe bidirectional.
g test2 - test the communication two pipes bidirectional.

2 FCH supervisor - handles reboot and crashes during a FCH


session. It performs different actions depending on the FCH
state:

a Perform switch over after a reboot initiated by fchstart.


b Perform clean up and alarm raising and ceasing after reboot
initiated by fchfb and fchend.
c Handle fall back and clean up in case of uncontrolled reboot
during FCH session.

5.1.1.2 fchcommit

This component ends the supervision period and copies all necessary data,
such as CXC software packages and parameter files, to the other node to
prepare it for upgrade. LBB and 3pp software upgrades are not copied
however. They must be transferred manually by the operator.

5.1.1.3 fchend

This component has two functionalities, to upgrade the old node after a
commit, and to clean up after a FCH session that ended with fallback or
restore.

5.1.1.4 fchevent
4/002 01-CAL 119 0401 Uen A

This component is used to send event, raise alarms and cease alarms. All
event and alarm handling in FCH has been implemented in this component
to minimize dependencies between FCH and other ACS components such
as AEH.
DESIGN SPECIFICATION 50(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

5.1.1.5 fchfb

This component allows the FCH session to be aborted and reverts the
system to the configuration that existed before the FCH was initiated. In
case of a normal FCH of CXC software or parameters, it performs a
complete return to the old system. If LBB and 3pp software was upgraded,
it performs a switch over, i.e. it returns control of the system to the unmodi-
fied node, but does not restore software or parameters, which must be
handled by fchrst in this case.

5.1.1.6 fchrst

This component is used to restore one node using the BUR restore function-
ality. It is used to achieve fall back when LBB/3pp software has been
upgraded, to restore the old node if fchend fails in order to allow re-commit
of the new system, and generally to handle severe errors during a FCH
session where normal FCH functionality cannot restore the system - for
instance if a blue screen occurs during fchstart.

5.1.1.7 fchstart

This component is used to initiate a FCH session and to upgrade the first
node. It allows the operator to select CXC packages for install and delete,
edit CXC parameter files online or offline, add or replace LBB files, add or
replace resource instance files and upgrade LBB and 3pp software.

5.2 FCHLIBSRC

5.2.1 Classes

The FCH library is implemented as a Windows DLL and provides the


major part of the FCH functionality. It contains eight classes, as described
below.

5.2.1.1 ACS_FCH_ClusterControl

This class implements the methods in FCH to control the MSCS. This
includes starting and stopping of resources, ordered failover, node and
resource status control, reconfiguration of the cluster database via PRC, and
pausing and resuming cluster nodes.

5.2.1.2 ACS_FCH_Common
4/002 01-CAL 119 0401 Uen A

This class is the base class and implements common functions used by all
the other classes, such as event reporting, activity and error logging and
various I/O and file handling functions.

5.2.1.3 ACS_FCH_Error

This class implements FCH error message handling.


DESIGN SPECIFICATION 51(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

5.2.1.4 ACS_FCH_Exception

This is an exception class for FCH.

5.2.1.5 ACS_FCH_LBBFiles

This class implements replacing of LBB files, i.e. arbitrary files in the
system. It has methods for backing up replacing a file, fall back and
commit.

5.2.1.6 ACS_FCH_Package

This class implements handling of CXC packages. It has methods for


installing and removing CXC packages, version checking, listing of CXC
packages, transactional install and remove logs for packages, fall back and
commit.

5.2.1.7 ACS_FCH_Parameter

This class implements editing of CXC parameter files. It has methods for
backup and edit of parameter files, syntax check, updating the PHA param-
eter database, fall back and commit.

5.2.1.8 lbbfile

This class is used to represent a LBB file.

5.2.1.9 ACS_FCH_Exception

Handles error messages in combination with exceptions.

5.2.1.10 rinUpdate

This class is used to represent a resource instance and it’s relations.

5.2.1.11 parfile

Represents a parameter file and it’s replacements.

5.2.1.12 ACS_FCH_Time

Represents boot time measurement and time measurement tied to a chosen


registry key.
4/002 01-CAL 119 0401 Uen A

6 PROCESSES

FCH does not implement any supervised processes, but require that all
supervised process are online. FCH also supervises the PRC Cluster
Control process during switch over, to make sure it is properly stopped and
started.
DESIGN SPECIFICATION 52(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

7 PERSISTENT STORAGE

FCH uses the following persistent storages:

a The FCH activity log acs_fch_activity.log which is located in


<ACS_LOGS>\FCH on the data disk. A node local log file is
also kept in C:\ACS\logs\FCH with the same name. This is
mainly used by the ACS_FCH_Server component to log
startup and shutdown of the service, but is also used as FCH
activity log in case the data disk cannot be accessed.
b FCH stores CXC packages to be installed during the FCH
session at C:\ACS\data\FCH\new.
c FCH stores installed and committed CXC packages at
C:\ACS\data\FCH\current.
d FCH stores backups of ACS file and binaries in
C:\ACS\data\FCH\bin and C:\ACS\data\FCH\fchbin.
e FCH uses the registry key
HKEY_LOCAL_MACHINE\Cluster\FCHIP to store the
current FCH state.
f FCH uses the registry key
HKEY_LOCAL_MACHINE\Cluster\LBB to store a boolean
value to indicate if LBB software upgrade is in progress.
g FCH uses the registry key
HKEY_LOCAL_MACHINE\Cluster\ASKBOOT to store a
boolean value to indicate if the operator is to be able to select
if he wants to boot at the end of fchstart, fchfb and fchend.
This is used for testing purposes only and should never be
used on site.
h FCH uses the registry structure under
HKEY_LOCAL_MACHINE\Software\Ericsson\Adjunct
Processor\ACS\FCH to store installation and removal
transaction logs.
i FCH uses the registry key
HKEY_LOCAL_MACHINE\Cluster\ORIGNODE to
establish on which node the FCH session started.
j HKEY_LOCAL_MACHINE\Cluster\BOOTCOUNT_NODEN used
tois
count number of reboots on that node during one state.
k HKEY_LOCAL_MACHINE\Cluster\BEGSW is used
measure the time from a switchover started to it’s finished.
l HKEY_LOCAL_MACHINE\Cluster\OLDBOOTSTATE_NODEN us is
ed to verfy that the reboot occurred in the same state.
4/002 01-CAL 119 0401 Uen A

m HKEY_LOCAL_MACHINE\Cluster\BOOTTIME_NODEN used
is to
establish when a boot occurred and how long time that has
passed since it.
n HKEY_LOCAL_MACHINE\Cluster\OLDBOOTTIME_NODEN co is
mpared with the previous value to verify that a new boot has
occurred or not.
o lopt is used to save the argument to -l (-i) and use it again on
the old other node to do the same LBB files update there.
DESIGN SPECIFICATION 53(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

8 ERROR HANDLING

All errors are logged in the FCH activity log file. Depending on the situa-
tion FCH tries to either perform the action again to attempt to bypass inter-
mittent errors, ignore the error and continue the FCH session if the error is
not serious, or if the error cannot be handled or bypassed printing an error
message on the console and exit.

9 FUNCTION CHANGE

NA

10 START, STOP AND RESTART

FCH has no supervised processes.

11 CONFIGURATION

FCH has one configuration file. <AP_HOME>\ACS\etc\FCH_service_def.


It contains the name of the FCH service which is added to the LCTBIN file
SetupService.def. This allows the FCH service to be configured by the
SetupServices command.

12 CAPACITY

12.1 DATA FOR CAPACITY ESTIMATION

FCH will have small impact on system capacity.

12.2 CAPACITY ESTIMATION

NA

13 SPECIAL FEATURES

NA

14 FCH, THE STATE MACHINE


4/002 01-CAL 119 0401 Uen A

14.1 PAUSING THE NODES AND GROUP OWNERSHIP

In the Microsoft Cluster, there is a important concept called cluster node


pausing. It is normally used for maintenance. If a node is paused, cluster
resource groups cannot be moved to that node. There is an exception
however: If a group can belong to several nodes and if a node goes down,
and the only node available to groups is a node that is paused, the groups
DESIGN SPECIFICATION 54(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

will be moved to that node anyway with the groups offline. I.e. a group can
temporarily belong to a paused node but is offline. If a group can only
belong to one node and that node is down, it will have no owner.

FCH is using the pausing of a node to prevent any “spontaneus” moves of


groups. Why? We shall see below.

14.2 THE CLUSTER DATABASE

The cluster database is really 2 registry databases which are equal or made
equal. On the data disk exists a change log.

Now, let’s say a FCH session has upgraded one of the nodes in the Cluster
including the cluster database. For example, a cluster resource has been
added. This resource belongs to the current node, let’s say. One could be
tempted to beleive that a restore of the upgraded node would revert the FCH
session to it’s previous state. This is, however not the case. The addition of
the cluster resource affects the database on both nodes. If one node is
restored, one of the “identical” databases will be different from the other.
In this case, the database with the latest timestamp will “win” and the oper-
ator will have the old original node with an upgraded database! The conse-
qvence of this is that the cluster database will need special handling to be
reverted back to it’s original state. When the cluster database has been
changed, FCH uses the PRC command prcconf to update or revert the data-
base. It should be obvious now that an failover with upgrade of database
needs the cluster resources on the executing node to be offline.

Any failover should therefore be made offline to provide change of data-


base using prcconf.

FCH uses states to keep track of what has been done, so it can properly fall-
back the system should a failure occur.

The use of states is extremely important when keeping track of cluster data-
base changes.

14.3 NODE AVAILABILITY.

A very important use of states is when an AP node becomes unavailable


due to node failure. For example if the upgraded node crashes and FCH is
in the middle of an upgrade of the other node. The FCH should then provide
that the remaining node becomes available as quickly as possible. It uses
FCH states to acheive this.
4/002 01-CAL 119 0401 Uen A

14.4 FCH STATES.

Among other things, FCH has always been a state machine. In this APG40
NT version, with a 2-node cluster, the states are more of a transaction log
where each state represents a set of actions and direction. In the APG30
DESIGN SPECIFICATION 55(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

double-partition Unix version, any state could be changed to the state


"Failed" where FCH returned to the original partition, ceased the alarms
etc. Thus it was more of a real state machine.

In all the states from Installing to CommitDone it is possible to use fchfb to


fallback or fchrst to restore the upgraded system.

In each of the states, before CommitDone, if reboot or other interrupt or


failure occurs, automatic activation of original non-upgraded node will
occur. If non-LBB upgrade, automatic fallback of upgraded system will
take place. If something goes wrong during fallback, or if LBB upgrade has
taken place, FCH restore (fchrst) execution is necessary.

After state CommitDone, only local fallback (INGO1 addition) can take
place of non-upgraded node (during upgrade attempt). Also, if newly
upgraded node fails, old non-upgraded node can be activated.

Example, failure in Move1 state Failure

noFCH Move1 Committing

noFCH Committing

Move2
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 56(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

Example, Failure in Move1 state, LBB upgrade Failure

noFCH LBBReboot Move1 Committing

noFCH FbFailover2
Committing

Move2
Restore

14.4.1 Successful FCH session until supervision (normal flow of events)

To inititate the FCH session, the operator executes fchstart. The command
will upgrade the system. Then, fchstart sets Reboot state, reboots and
ACS_FCH_Server takes care of the remaining steps until Supervision.

Assume that the node A is beeing upgraded. fchstart will reboot the system
and ACS_FCH_Server will switchover and make the node A active after
reboot. The newly upgraded system is now active and beeing supervised by
the operator and ACS_PRC_ClusterControl..

Table 14.1

1. noFCH No FCH is going on. If initiation of FCH is desired, fchstart


command is started from passive cluster node.

2. Installing Passive cluster node is paused, (the concept of


paused and resumed node is used to decide
whether a failover can be done to the node or not),
FCH in progress alarm is raised, the cluster
resource group(s) owned by current passive node
are shut down to ensure a problem-free installa-
tion, a system edit/update is done (see separate
chapter) on passive cluster node. Finally, the oper-
ator is prompted for reboot to activate the new sys-
4/002 01-CAL 119 0401 Uen A

tem
2B LBBRe- This special state tells FCH fchstart that LBB is
boot1 beeing upgraded (fchstart -L) and that several con-
sequtive reboots can occur. The state is changed to
Installing when operator types “l” (leave) in the
LBB window.
DESIGN SPECIFICATION 57(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

Table 14.1
3. Reboot The state is set by fchstart before reboot of system
to activate the new software. Events are sent. After
reboot “reboot success” events are sent and state is
changed to Failover1.
4. Failover1 All cluster groups except of “Cluster Group” are
brought down. Both cluster nodes are resumed to
enable failover. (So-called MoveClusterGroup).
The cluster is now ready for an offline failover.
Online failover is not suitable for FCH since the
cluster database might be changed.
5. Move1 Move (failover) the cluster groups. Only the clus-
ter groups that has more than one node owner can
be failed over. If the owner for a cluster group was
non-upgraded node A, the new owner will be
upgraded node B. The current updated node is now
active with it’s system upgraded.
6. Config1 If old and new PRC_Config files are different, that
is, if cluster configuration has changed, delete
cluster database based on current old config file.
(Current configuration).
7. Config1B Create new cluster database based on new updated
config file. (New cluster resource configuraion).
8. Failover2 This state indicates that failover and possible con-
figuration change is done. Pause other node and
resume the current one.
9. Supervi- The current upgraded node is started. This node is
sion paused and other node is resumed. Other node is
started. The upgraded node is started first which is
somewhat more complex than the other way
around. The reason of why the more complex solu-
tion is used is improved “ISP”, in service perfor-
mance.
Both nodes are now started and the operator
should now observe the system in at least 2 hours.
He can choose to fallback using fchfb, commit the
AP using fchcommit or in the worst case, restore
using the fchrst command.
4/002 01-CAL 119 0401 Uen A

10. Commit- The operator should normally repeat the fchcom-


ting mit command should a failure happen. If no other
alternative, a fallback can be tried.
DESIGN SPECIFICATION 58(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

14.4.2 Exceptional flow of events up to CommitDone (fallback or restore)

In all states, up to, (but not included) CommitDone, the FCH session might
be reverted to the situation that existed prior to Function Change. This is
due to a failure, for example unexpected reboot, or operator intervention
using for example fchfb, Function Change fallback. FCH can be interrupted
in any state and a fallback will start. An alternative to FCH fallback is a
single node restore which is implemented by fchrst, Function Change
restore. fchrst is always needed when a LBB upgrade is needed.

11. FbFail- Check if cluster database needs to be reverted.


over1 Cease FCH in progress alarm and raise FCH failed
alarm. Stop all groups except for “cluster group”.
This state is entered from Supervision state.
12. Config3 If cluster database has changed, delete the parts of
the cluster database based on the new
PRC_ConfigFile. Only the resources belonging to
the non-upgraded node gets deleted.
13. Move2 Failover (Failback) to original non-upgraded node.
14. Config4 If cluster database has changed, activate old data-
base (used at initialisation of the FCH session)
using the old PRC config file. The resources with
the currently upgraded node as owner are
unchanged.
15. FbFail- Pause current upgraded node. If LBB was
over2 upgraded, exit fchfb (or return ACS_FCH_Server
thread) command now to let operator restore node
using fchrst. If LBB was not upgraded, all
upgrades (or downgrades) are falled back to it’s
original status. The fallback occurs automatically
(initiated by ACS_PRC_ClusterControl) or by
operator intervention using fchfb command.
16. Restore If the user chose to restore system due to LBB
upgrade or other, the Restore state is set before the
actual restore. fchend then does a possible update
of the cluster database and finishes the FCH ses-
sion by executing the actions of the End state.
17. Config5. Delete the resources in cluster database belonging
4/002 01-CAL 119 0401 Uen A

to upgraded node using new PRC_Config file. Cre-


ate new cluster database for upgraded node using
old PRC_Config file.
18. FbFail- Means that fallback (of node and cluster database)
over3. is done. Send events and do some cleanup.
DESIGN SPECIFICATION 59(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

19. FbReboot. Set this state when FbFailover3 is ready. This state
is used when a calling function wants to give a
reboot order during fallback. Set FbReboot2 state.
20. FbReboot2 Set this state before reboot. Create reboot file to
communicate to PRC that reboot should not be
counted. Do the actual reboot. After reboot, send
some events, and cleanup.
21. End. Pause other node, resume the fallbacked node,
start up the services, pause this node and resume
other node again. Run fchend and ensure that fall-
backed node is executing again, cease alarm and
cleanup.

14.4.3 Misc exceptional flow of events, fallback from Installing or Failover1 state

4. Failover1 If this state (see above) gets interrupted, pause


upgraded node, start other node. Set Installing
state.
2. Installing If fallback occurs, and Failover1 was the previous
state, this state is set. All upgrades are falled back
and state is set to Reboot2.
37. Reboot2 If state was Reboot2 and this was the upgraded
node and other node was down, set state Start-
OrigNode.
38 Start- This special state ensures that the upgraded node
OrigNode gets started, online and active. Otherwise, the old
other node would be online and active.
39. End- This special state shows that the original formerly
OrigNode upgraded node now is online and active. Normally,
after a fallback, the other node would be online
and active.
fchend can be run and alarms are ceased, state is
set to noFCH and cleanup is performed.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 60(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

14.4.4 Normal flow of events until noFCH is set and FCH session is successful.

If fchcommit was successful and state CommitDone was set, the FCH
session has to be ended by installing the passive current node to make it
equal the active node.

22. Commit- This state is set after state Committing when a suc-
Done cessful fchcommit has been executed. The newly
upgraded node is active and has been approved by
the operator.
23. LBBRe- This special state is set if the user has upgraded the
boot2 LBB during the FCH session. The operator can do
several consequtive reboots to install drivers etc. It
is the operator’s responsibility that he follows the
procedures exactly as was done on the original
node.
24. EndInstall- This corresponds to the Installing state but on the
ing other node. FCH will automatically update the
system exactly as was done on the originally
updated node with the exception of LBB upgrades.
25. Config2 If cluster database was updated, the corresponding
changes for this node will be done online. In this
state, the old configuration will be deleted.
26. Config2B If cluster database was updated, the corresponding
changes for this node will be done online. In this
state, the new configuration will be added.
27. EndInstall- Installation of node is ready. Send events.
Done
28. EndReboot The system will be rebooted to activate the new
software. Events are sent.
29. EndReboot The system has successfully rebooted. Events are
Done sent and alarms ceased. State noFCH is set and
cleanup is performed.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 61(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

14.4.5 Exceptional flow of events. Active node not available.

If the active, newly upgraded node becomes unavailable, the old node must
become the active one. A failover cannot be done right away, since the
cluster database has to be reverted, if neccessary. The states are bidirec-
tional..

Table 14.2

22. Commit- This state is set after state Committing when a suc-
Done cessful fchcommit has been executed. The newly
upgraded node is active and has been approved by
the operator.
30. InitWrong- Failure of active upgraded node has occurred and
Node check is done to see if cluster database change is
neccessary. Resume node to be able to move clus-
ter groups without current owner. Stop all groups
except for cluster group. Start cluster group if
offline. Resume both nodes. Failover to current
non-upgraded node
31. Config6 If cluster database has changed, delete the services
belonging to current node using new PRC_Config
file.
32. Config6B If cluster database has changed, add the resources
belonging to current node using old PRC_Config
file.
33. InitWrong- The switch to old non-upgraded node has been
NodeDone done. Resume this node, pause other node, ensure
that that Cluster group is online and start current
node.
34. End- The old node is up and running.
Wrong-
Node
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 62(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

Example, old node active, then, new node active again.

Upgraded node down Old node active

CommitDone EndWrongNode

CommitDone EndWrongNode

Upgraded node active again Upgraded node up

14.4.6 Exceptional flow of events, local fallback or restore of old node

If the non-upgraded node fails during upgrade, local fallback of this node
is neccessary. In the worst case, if the newly upgraded node fails during
upgrade of the inactive node, both a local fallback and a switchover to this
node might be neccessary.

24. EndInstall- When part of the fallback procedure, fallback of


ing packages, parameters etc. will take place.
35. FbEn- This state is set if after EndInstalling state. FCH
dReboot will try to reboot to activate the old software.
36. FbEn- Reboot has occurred and old software is activated.
dReboot-
Done
22. Commit- This state is set after state FbEndRebootDone. The
Done newly upgraded node is active and has been
approved by the operator. The old node is not yet
upgraded.

If restore of old node is required, due to, for example, failed LBB upgrade,
4/002 01-CAL 119 0401 Uen A

fchrst is run by operator and Restore2 state is set.


DESIGN SPECIFICATION 63(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

24. LBBReboo If this stateor other state belonging to old node is


t2 interrupted by failure, the operator runs fchrst and
Restore2 state is set.
40. Restore2 When this state is set, restore should be done on
old node. When Restore is done, fchcommit must
be run again to commit the AP system.

15 REFERENCES

[1] 2/1551-ANZ 222 01 Uen Adjunct Computer Subsystem,


Terms and Abbreviations

[2] 5/1056-ANZ 222 03 Uen Adjunct Computer Subsystem


(ACS) - System Version Control in the AP.
4/002 01-CAL 119 0401 Uen A
DESIGN SPECIFICATION 64(64)
Uppgjord (även faktaansvarig om annan) - Prepared (also subject responsible if other) Nr - No.

102 62-CNZ 222 59 Uen


Dokansv/Godk - Doc respons/Approved Kontr - Checked Datum - Date Rev File

2000-11-15 B

16 ANNEXES

16.1 ANNEX REVISION HISTORY

Rev Date Prepared Description

PA1 1999-12-09 UABRUDO First revision.

PA2 2000-05-26 UABRUDO Updated for CM12 delivery.

A 2000-05-29 UABRUDO Firm revision.

PB1 2000-11-13 QABKULD INGO1 updates.


4/002 01-CAL 119 0401 Uen A

S-ar putea să vă placă și