Sunteți pe pagina 1din 68

Simple Overview AIX HACMP Cluster implementation

Version: Date: By: For who:

0.9 05/01/2011 Albert van der Sel Anyone who likes a very simplified high-level overview on the main HACMP conc This note describes HACMP HA clusters, and will not directly focus on the newer PowerHA implementation.

There are several cluster implementations possible on AIX, of which the most well known are:

- GPFS cluster systems, mainly used for High Availability and parallel access of multiple Nodes - HACMP systems, mainly used for High Availability through Failover of services (applications) t

Section 1: HACMP - High Availability Cluster Multi-Processin


1.1 Conceptual view of a simple 2 Node HACMP configuration:
Fig. 1 public network

node A
/etc/hosts lists all ip adresses

172.18.17.6

rsct hacmp daemons: rscd clstrmgr clcomd grpglsmd hatsd hagsd logfiles
- Resource group descriptions - start/stop scripts for the application - Resource Group takeover - Service IP address takeover

RESOURCE: service ip address 172.18.17.10 for an application, owned by node A, and could be taken over by Node B

private network
10.10.10.6

KA messages=heartbeats on: (1): private IP network and (2): non IP heartbeat - Might be through rs232 like. connection (old method). - Or this type of heartbeat is "disk" based, using FC adapters to a disk in a concurrent VG.

for example fiber

hdisk1 hdisk2 hdisk3 etc.. SAN A

resource group RG1 application_01 volume groups filesystems replication

<------------>

HACMP Main Function:

Resources (like volumes, filesystems, application service IP etc..) are grouped together in Reso which HACMP keeps highly available as a single entity. When a Node who owns a Resource Group fails, the Resource Group "fails over" to another node. Since an "application" is determined by it's filesystems, and it's IP parameters, the RG is a so "container", that can fail over from one Node to another node. This is "high availability" of t Ofcourse, a Resource Group will not magically fly from one Node to the other Node, so it's prob the other node "aquires" the Resource Group (open the Volume Group, mount filesystems, aquire th

If a Node providing the appliction, goes "down", the RG will "move" to the other Node. Stop and of the application will ensure that the application stops at the first Node, and start (a bit la So, "wrapping" stop and start shell scripts, on both nodes, are an element too in the HA environ

Some HACMP network keypoints:


The application associated IP (service label IP) can "fail over" either by: - IP address takeover (IPAT) via IP aliases - IPAT via IP Replacement. IP address takeover is a mechanism for recovering a service IP label by moving it to another physical network adapter on another node.

An IP alias is an IP address that is configured on a interface in addition to the base IP addres An IP alias is an AIX function that is supported by HACMP. Multiple aliases are possible, on dif A "Boot" IP address is the (regular) boot or base address configured for an interface in AIX. The "service lable IP" is the address on which clients can connect to the service (application). The service IP address will be added on top of the base IP address (IPAT via aliasing), or will the base (boot) IP address of the communication interface, depending on the IP address take over

Some HACMP heartbeat keypoints:

Always a private IP network based heartbeat will be implemented. Secondly, a Non IP based heartbeat must be present. This could be realized using rs232 on both n or using a "disk based" heartbeat (using FC's) to a specially configured disk in a conncurent VG In this way, HACMP maintains information about the status of the cluster nodes and their respect So, if only the IP network temporarily fails, but the rs232 or disk heartbeats still work, fail Indeed, in that situation the nodes are healthy, but the IP network only had a temporarily "hick So, it's a sort of "saveguard" to avoid "split brain" or a partitioned (or ill-functioning), clu

1.2 A view on the latest versions and AIX matrix:


AIX 4.3.3 No No No No No No No No No No AIX 5.1 Yes Yes Yes Yes No No No No No No AIX 5.1 64b No Yes Yes Yes No No No No No No AIX 5.2 Yes Yes Yes Yes Yes TL8+ TL8+ No No No AIX 5.3 No No Yes Yes Yes TL4+ TL4+ TL9+ TL9+ No AIX 6.1 No No No No

HACMP 4.5 HACMP/ES 4.5 HACMP/ES 5.1 HACMP/ES 5.2 HACMP/ES 5.3 HACMP/ES 5.4.0 HACMP/ES 5.4.1 PowerHA 5.5 PowerHA 6.1 PowerHA 7.1

Yes
No

Yes
TL2,SP1+ TL2,SP1+ TL6+

The ES stands for "Enhanced Scalability". As of 5.1, HACMP/ES is solely based on RSCT, short for Reliable Scalable Cluster Technology. As from version 5.5, HACMP is renamed to PowerHA.

1.3 Main HACMP daemons:

Notice that if you list the daemons in the AIX System Resource Controller (SRC), using the "lssr you will see ES appended to their names. The actual executables do not have the ES appended. clstrmgr Cluster Manager daemon

clcomd Cluster communication daemon

This daemon monitors the status of the nodes and their inter in response to node or network events. It also centralizes t about HACMP-defined resource groups. The Cluster Manager on the HACMP global ODM, and other Cluster Managers in the clus location, and status of all HACMP resource groups. This info whenever an event occurs that affects resource group configu All cluster nodes must run the clstrmgr daemon. From HACMP V5.3 the clstrmgr daemon is started via init proc As of 5.2, the dependency on rhosts and r commands has been Starting with Version 5.2, clcomdES must be running before a The clcomd daemon is started automatically at boot time by t It provides secure remote command execution and HACMP ODM co

clsmuxpd Cluster SMUX Peer daemon (only for versions lower than HACMP 5.3) RSCT Reliable Scalable Cluster Technology The "glue" in AIX clustering. (as of AIX 5.1).

This daemon maintains status information about cluster objec the Simple Network Management Protocol (snmpd) daemon. All c Note: The clsmuxpd daemon cannot be started unless the snmpd It no longer exists as of HACMP 5.3

Reliable Scalable Cluster Technology. Since HACMP 5.1, HA a neccessary component or subsystem. For example, HACMP u RSCT is a standard component in AIX5L.

Reliable Scalable Cluster Technology, or RSCT, is a set of s comprehensive clustering environment for AIX and Linux. RSCT of IBMr products to provide clusters with improved system av RSCT includes these components: -Resource Monitoring and Control (RMC) -Resource managers (RM) -Cluster Security Services (CtSec) -Group Services -Topology Services

topsvcsd Cluster Topology Services Subsystem hagsd RSCT group services subsystem hatsd RSCT Topology Services subsystem grpglsmd

The following daemons are related to the RSCT framework. The RSCT Toplogy Services subsystem monitors the status of n All cluster nodes must run the topsvcsd daemon. It uses the daemons hatsd and hats_nim.

This RSCT subsystem provides reliable communication and prot

This RSCT daemon operates as a Group Services client; its fu global across all cluster nodes. All cluster nodes must run

This RSCT subsystem acts as a resource monitor for the event information about the operating system characteristics and u The RMC subsystem must be running on each node in the cluste

from inittab when it is installed. The rc.cluster script ens

1.4 Main HACMP logs: Main HACMP to v. 5.5 logs, and PowerHA logs
/tmp/hacmp.out or /var/hacmp/log/hacmp.out This is your main logfile. /usr/es/adm/cluster.log or /var/hacmp/adm/cluster.log The regular AIX system error log use the "errpt" command to view this log. /usr/sbin/cluster/history/cluster.mmdd or /var/hacmp/adm/history/cluster.mmddyyyy /tmp/clstrmgr.debug

Contains time-stamped, formatted messages generat In verbose mode, this log file contains a line-by in the scripts, including the values of the argum the HACMP for AIX software writes verbose informa change this default. Verbose mode is recommended. Contains time-stamped, formatted messages generat In this log file, there is one line written for t for the completion.

Contains time-stamped, formatted messages from al for AIX scripts and daemons.

/tmp/cspoc.log

Contains time-stamped, formatted messages generat The system creates a new cluster history log file occurring. It identifies each day's file by the f the month and dd indicates the day. Contains time-stamped, formatted messages generat Information in this file is used by IBM Support p Note that this file is overwritten every time clu so, you should be careful to make a copy of it be Contains time-stamped, formatted messages generat Because the C-SPOC utility lets you start or stop the /tmp/cspoc.log is stored on the node that ini

/var/hacmp/clverify/clverify.log /var/hacmp/log/clutils.log: /usr/es/sbin/cluster/utilities/clsnapshots

Contains messages when the cluster verification h Every day a health check is performed at 00:00h,

Not a logfile. Only for later versions. The HACM cluster snapshot facility (/usr/es/sbin/ in a file, a record all the data that defines a p It also allows you to create your own custom snap to save additional information important to your You can use this snapshot for troubleshooting clu

1.5 Checking HACMP processes:


(1): View the logs:

First and foremost, you can check your processes by viewing the logs as described in section 1.4

(2): Viewing running processes:


Checking the cluster processes: (1): using smitty smitty hacmp > System Management (C-SPOC) > Manage HACMP Services > Show Cluster Services

# smitty hacmp HACMP for AIX Move cursor to desired item and press Enter. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools

(2): using SRC list commands: # lssrc -a | grep active # lssrc -g cluster # lssrc -ls topsvcs (3): using ps -ef: # ps -ef | grep clstrmgr # ps -ef | grep clcomd etc..

# shows all active daemons under the control of the SRC # shows all processes in this group # shows the heartbeats

1.6 Some remarks on the shared storage:

In figure 1, a situation is depicted, where each node is connected to two separate storage syste But, for each node, one such storage system (e.g. on a San), can be considered to be "owned", or Then, in case of failover, the associated Resource Groups including the volumes and filesystems, will then be aquired by the other node. Ofcourse, both nodes can be attached to one local storage system (one SAN) as well.

node 1

node 2

node 1

Volume Groups Physical Volumes Logical Volumes - Filesystems

not too large distance, otherwise additional c are needed.

Disk subsystems that support access from multiple hosts include SCSI, SSA, ESS and others. On a logical level, the Volume Groups are defined as follows.

In a non-concurrent access configuration, only one cluster node can access the shared data at a It's "shared" because both nodes can access the Volume Group, but only one Node at the time, unt If the resource group containing the shared disk space moves to another node, the new node will and check the current state of the volume groups, logical volumes, and file systems.

HACMP non-concurrent access environments, use normal journaled file systemsas JFS2 to manage dat while concurrent access environments often use RAW logical volumes. These are thus not the regul Also, true concurrent access is possible too using a true cluster filesystem like the GPFS files as the name already shows, allows the use of regular filesystems.

A Concurrent Capable or Enhanced Concurrent Capable volume group is a volume group that you can on more than one AIX instance at a time. So, if you have a Concurrent VG, it's online on all nodes. Concurrent Capable option can only be used in configurations with HACMP and HACMP/ES With HACMP you can create enhanced concurrent volume groups. Enhanced concurrent volume groups c for both concurrent and non-concurrent access. But again, only with non-concurrent access you will have the regular filesystems which most appl With concurrent access, the application needs to be able to deal with the "concurrent" access mo The exception ofcourse is using a true cluster filesystem as GPFS, which allows for concurrent a the "looks and feel" of any other filesystem. Or you should use RAW volumes for true concurrent

The enhanced concurrent volume groups are varied on all nodes in the resource group, and the dat is coordinated by HACMP. Only the node that has the resource group active, will vary on the volume group in "concurrent active mode"; the other nodes will vary on the volume group in pass In passive mode, no high level operations are permitted on that volume group. An example of physical paths to Storage:

When for example SDD MPIO "multipath" is used, the connections of the nodes to a storage system, typically looks like this:

NodeA

NodeB

FC cards
1 2

FC cards
3 4

FC connections swiches 1 and 3 2 and 4


storage system

1.7 Some general remarks on Starting and Stopping of HACMP:


1.7.1 HACMP shutdown modes:

- Graceful: Local machine shuts itself gracefully. The remote machine interprets this as a grace does not takeover resources . - Takeover (Gracefull with takeover): Local machine shuts itself down gracefully. The remote machine interpret this as a non-graceful down and takes over resources. - Forced: Local machine shuts down cluster services without releasing any resources. Remote mach take over any resources. This mode is use ful for system maintenence.

If you do a "shutdown", or reboot "shutdown -r", the rc.shutdown script will stop the cluster se with a graceful shutdown. So, the other node won't takeover the resourses. Also, if you reboot your system then the other node will not take over.

1.7.2 Starting and stopping HACMP:


Starting the HACMP services can be done in the following ways:

1. Using smitty clstop and smitty clstart:

The easiest way to stop and start HACMP, is using "smitty clstop" and "smitty clstart". Suppose you just need to reboot a node without that resources need to failover. Then you would c the "graceful" shutdown of HACMP. When that's done, and no applications are active, you can use "shutdown -Fr" command to reboot the node. But first shutdown HACMP using: # smitty clstop Type or select values in entry fields. Press Enter AFTER making all desired changes. * Stop now, on system restart or both + Stop Cluster Services on these nodes + BROADCAST Cluster shutdown? +* Shutdown mode + [Entry Fields] now [starboss] true graceful

You can view "/tmp/hacmp.out" for any messages and see if it has shutdown in good order. Then you can use "shutdown -Fr" to reboot the machine. Note: Make sure no apps are running, and users are out of the system, so that all filesystems ar

Depending on how IP and HACMP is configured, you may see that at the system boot, you can only p it's "boot address". After the node is up, and there is no "autostart" of HACMP, you need to start HACMP manually, fo "smitty clstart" or "smitty hacmp". # smitty clstart Type or select values in entry fields. Press Enter AFTER making all desired changes. * + + + + + Start now, on system restart or both Start Cluster Services on these nodes BROADCAST message at startup? Startup Cluster Information Daemon? Reacquire resources after forced down ? [Entry Fields] now [starboss] false true false

During and after HACMP startup, you may experience that your remote session to the machine using has stopped working, and that the machine is accesible again, with a remote terminal, using the Note: if HACMP starts from the inittab, then you do not need to start HACMP manually.

See section 5 for information on how to check if HACMP is running. Note: be sure you have documented all IP parameters of the machine.

2. Using "smitty hacmp":


You can also use the main smitty HACMP menu system, like so: # smitty hacmp Move cursor to desired item and press Enter. Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools

Note that "smitty hacmp" can lead you to the socalled "C-SPOC" utility, shown by the "System Man Move cursor to desired item and press Enter. Manage HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP Log Viewing and Management HACMP File Collection Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management Configure GPFS

Move cursor to desired item and press Enter. Start Cluster Services Stop Cluster Services Show Cluster Services

3. Using scripts:

Starting: The "/usr/es/sbin/cluster/etc/rc.cluster" script initializes the environment required for HACMP/ the "/usr/es/sbin/cluster/utilities/clstart" script to start the HACMP daemons. The clstart script calls the SRC startsrc command to start the specified subsystem or group. Thus, clstart invokes the AIX System Resource Controller (SRC) facility, to start the cluster da The following figure illustrates the major commands and scripts called at cluster startup: rc.cluster -> clstart -> startsrc

Using the C-SPOC utility, you can start cluster services on any node (or on all nodes) in a clus by executing the C-SPOC /usr/es/sbin/cluster/sbin/cl_rc.cluster command on a single cluster node The C-SPOC cl_rc.cluster command calls the rc.cluster command to start cluster services on the n from the one node. The nodes are started in sequential order, not in parallel. The output of the

run on the remote node is returned to the originating node. Because the command is executed remo there can be a delay before the command output is returned. Note that the clcomd daemon (called clcomdES) is started from "/etc/inittab". You will probably find the following record in inittab: clcomdES:2:once:startsrc -s clcomdES >/dev/console 2>&1

Depending on how HACMP is configured, the rc.cluster script might be called from inittab as well In that case, HACMP is started from inittab at boottime. The following record might be present i hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1 In fact, if the "rc.cluster" script is called using the parameter "-R", that inittab entry will Below, is a small fragment from the comments that can be found in rc.cluster: # Arguments: -boot : configures service adapter to use boot address # -i : start client information daemon # -b : broadcast these start events # -N : start now # -R : start on system restart # -B : both # -r : re-acquire resources after forced down # # Usage: rc.cluster [-boot] [-l] [-c] [-b] [-N | -R | -B] [-r] Shutdown: Newer AIX /usr/sbin/shutdown commands, automatically calls the PowerHA, or HACMP, /"usr/es/sbin/cluster/etc/rc.shutdown" command, which will stop the HACMP services. Since many shops have a custom "/etc/rc.shutdown" script (which contain statements to stop all sorts of other processes), the HACMP rc.shutdown version, will also call that /etc/rc.shutdown script.

1.7.3 Entries in /etc/inittab:


What you might find in the startup file "/etc/inittab", are the following records: - During installation, the following entry is made to the /etc/inittab file to start the Cluster Communication Daemon at boot: clcomdES:2:once:startsrc -s clcomdES >/dev/console 2>&1

- Usually, during install, the following entry is added to inittab for autostart the HACMP daemo Also, if you use the "rc.cluster" script with the "-R" parameter, the entry will be added if it' hacmp:2:wait:/usr/es/sbin/cluster/etc/rc.cluster -boot> /dev/console 2>&1 For PowerHA (the new renamed HACMP), a similar entry is present: hacmp:2:once:/usr/es/sbin/cluster/etc/rc.init

- Because of the specific actions needed to implement IP Address Take over (IPAT), a dedicated s That's why you will find the following entry in inittab: harc:2:wait:/usr/es/sbin/cluster/etc/harc.net # HACMP network startup

1.8 Some Management issues Shared Volume groups in HACMP:


1.8.1. Addition of volumes and issues:

In HACMP, a shared VG must be installed. Several management issues may present themselves at lat

For illustrational purpose, here are some "real life" questions and answers as found in various Although many Sysadmins use the commandline for changing LVM objects, in general, you should use like C-SPOC, for any change. This will ensure ODM updates on all involved nodes. You can access C-SPOC by using "smitty hacmp".

The C-SPOC commands only operate on both shared and concurrent LVM components that are defined as part of an HACMP resource group. When you use SMIT HACMP C-SPOC, it executes the comm on the node that "owns" the LVM component. This is the node that has it varied on. Below examples are for illustrational purposes only.

Question: I've got a HACMP (4.4) cluster with SAN- attached ESS storage. SDD is installed. How can I add volumes to one of the shared VG's?
Answer: 1) acquire the new disks on primary node (where the VG is in service) with: # cfgmgr -Svl fcs0 # repeat this for all fcs adapters in system 2) convert hdisks to vpaths. Note: use the smit screens for this because the commands have changed from version to version. 3) add vpaths to VG with: # extendvg4vp vgname vpath# 4) create LVs/filesystems on the vpaths. 5) break VG/scsi locks so that other systems can see the disks with: # varyonvg -b -u vgname 6) perform steps 1 & 2 for all failover nodes in the cluster. 7) refresh the VG definitions on all the failover nodes with: # importvg -L vgname vpath# 8) reestablish disk locks on service node with: # varyonvg vgname 9) add new filesystems to HA configuration. 10) synchronise HA resources to the cluster. (note: it can be done as above, but normally I would advise C-SPOC through "smitty hacmp".

Question: How to add a vpath to running hacmp cluster with HACMP:


Answer: On the VG active node : #extendvg4vp vg00 vpath10 vpath11 #smitty chfs ( Increase the f/s as required ) #varyonvg -bu vg00 ( this is to un-lock the vg) On Secondary node where vg is not active : # cfgmgr -vl fscsi0 ( fscsi1 and fcs0 and fcs1 ) Found new vpaths # chdev -l vpath10 -a pv=yes ( for vpath11 also ) (Note: I don't think you need to set the # lsvg vg00|grep path ( just note down any one vpath which is from this o/p-for e.g vpath0 ) # importvg vg00 vpath0 Once that's done, go to Primary Node # varyonvg vg00 ( Locking the VG )

Question: I have an HACMP cluster with enhanced concurrent resource group. What is the best way to

Answer: Once the lun is visible on your cluster nodes, you should use HACMP C-SPOC in order to add the new lun to an enhanced concurrent volume group.

smitty hacmp > System Management (C-SPOC) > HACMP Logical Volume Management > Shared Volume Grou Set Characteristics of a Shared Volume Group > Add a Volume to a Shared Volume Group > <select If you cannot select any new volume in the last C-SPOC screen, then follow this: 1. Allocate the lun to both cluster nodes (this is already done by the SAN administrator) 2. Run cfgmgr on 1st node to pick up the new lun. 3. Set a pvid on the new lun chdev -l hdiskx -a pv=yes 4. Set the no reserve attribute chdev -l hdiskx -a reserve_policy=no_reserve 5. On the 2nd cluster node, run cfgmgr to pick up the new lun. Ensure it picks up the correct pvid as created on node 1 step 3 6. Set the no reserve attribute on your hdisk on node 2 chdev -l hdiskx -a reserve_policy=no_reserve

Now when you go into C-SPOC, you should see the new lun when you goto add a new volume to a shar

7. smitty hacmp > System Management (C-SPOC) > HACMP Logical Volume Management > Shared Volume Set Characteristics of a Shared Volume Group > Add a Volume to a Shared Volume Group > <select 8. Once the lun has been added, check on your cluster node 2 that the VG has the new lun assoc lsvg -p <vgname> Note: the reserve_policy=no_reserve allows the 2 cluster nodes to see the lun, without one node locking the lun from the other.

Question: How can I increase the filesystem on a shared VG in HACMP. What is different from just r

Answer: You have 2 (or more) machines in a cluster but only one of them has access to the filesystems at If you change the filesystem you have to make sure that not only this one machine but all machin get this information and update their bookkeeping data. Again, use C-SPOC in "smitty hacmp".

1.8.2 More on varyonvg in HACMP:


In HACMP, on the shared disksubsystem, one or more "concurrent capable" VG's will be defined. In normal operation, at startup of a node, or even if a fail-over will occur, you should not be by using varyonvg commands: it should all work "automatic" in HACMP. However, in certain circumstances, you need to be able to perform some actions manually. -> At a stand-alone AIX machine, if you would varyon a volumegroup, you would simply use: # varyonvg volumegroup_name

The normal varyonvg command as used above, will "lock" or reserve the disks for the current mach The base design of LVM assumes that only one initiator can access a volume group.

-> In a HACMP environment, you have a shared Volume Group, which is called "concurrent capable".

The varyonvg command knows many switches, but for HACMP, the following are the most relevant one

-b -c -u

This flag unlocks all disks in a given volume group. Varies the volume group on Enhanced Concurrent mode. This can only be done on which is for the shared VG's, used in a HACMP environment. Varies on a volume group, but leaves the disks that make up the volume group

Regular Failover configuration:

In a regular Failover HACMP configuration, one node actively accesses the VG, while the other no to take over the Resource Group containing the VG. So, the active node will have implemented "varyonvg vgname" to set the reservations. Although one node is active at the time, the VG still is configured as "concurrent capable". This is sometimes also called an "HACMP nonconcurrent access configuration". Concurrent access to the VG:

If RAW volumes, or the true clusterfilesystem as GPFS is used, multiple nodes can truly access t simultaniously. This is sometimes also called an 'HACMP concurrent access configuration". An example of such a configuration might be an Oracle RAC environment, used in conjuction with H Active and Passive Varyon in Enhanced Concurrent Mode:

An enhanced concurrent volume group can be made active on the node, or varied on, in two states: active or passive. Note that active or passive state varyons are done automatically by H

- Active state varyon behaves as ordinary varyon, and makes the logical volumes normally availab

The node that opens the VG in "active state varyon", can mount all filesystems, and all other us operations are possible like running applications from them. - Passive state Varyon: limited access. When an enhanced concurrent volume group is varied on in passive state, the LVM provides an equivalent of fencing for the volume group at the LVM level. Only limited operations are possible. The other nodes in a failover configuration will open the in "passive state varyon".

As said before, in certain circumstances "true" concurrent access is possible using RAW volumes using a true cluster file system (like GPFS, and not using JFS/JFS2). This type of cluster then would not be a "failover cluster". It has certain requirements of the accessing the VG. They should be able to handle "parallel" access, like Oracle RAC Clusterware c

1.9 Some HACMP utilities:


Some HACMP utiliies can provide you the cluster status, and other information.

- The "clfindres" and "clRGinfo" commands:


This command shows you the status of Resource Groups and where they are active. # /usr/es/sbin/cluster/utilities/clfindres Example output: ----------------------------------------------------------------------------Group Name Type State Location ----------------------------------------------------------------------------vgprod_resource non-concurrent ONLINE P550LP1

OFFLINE vgtest_resource non-concurrent ONLINE Example output: GroupName ---------C37_CAS_01 C38_CAS_01 Type ---------cascading cascading

P550LP2 P550LP2

State Location ------ -------UP P520-1 UP P520-2

Sticky Loc ----------

The same information can be obtained using the clRGinfo command. If fact, clfindres is a link to

- The "cllsserv" command:


Use this command to list all the applications configured in HACMP, including the start and stop # cllsserv Example output: OraDB_Appl SapCI_Appl /usr/local/bin/dbstart /usr/local/bin/sapstart /usr/local/bin/dbstop /usr/local/bin/sapstop

- The "clstat" command:


This commands shows you the overall cluster status. -a -n name -r tenths-of-seconds ascii mode shows information of the cluster with the specified name determines the refresh rate to update the information

# /usr/es/sbin/cluster/clstat -a Shows a list a nodes with their interfaces and status. clstat - HACMP Cluster Status Monitor ------------------------------------Cluster: Unix_cluster01 (1110212176) Wed Jan 5 09:04:53 NFT 2011 State: UP SubState: STABLE

Nodes: 2

Node: starboss State: UP Interface: star-oraprod_boot (2)

Address: 3.223.224.137 State: DOWN Interface: star-oraprod_stb (2) Address: 10.80.16.1 State: UP Interface: prodlun498 (0) Address: 0.0.0.0 State: UP Interface: star-oraprod (2) Address: 3.223.224.135 State: UP Resource Group: staroraprod_resource State: Address: 3.223.224.141 State: DOWN

On line

Node: stargate State: UP Interface: star-oratest_boot (2)

Interface: star-oratest_stb (2)

Address: 10.80.16.2 State: UP Interface: testlun498 (0) Address: 0.0.0.0 State: UP Interface: star-oratest (2) Address: 3.223.224.139 State: UP Resource Group: staroratest_resource State:

On line

- Commands to document the HACMP cluster:


Alongside the above commands (clfindres, cllsserv, clstat), the output of the below commands can be used to document your HACMP environment. /usr/es/sbin/cluster/utilities/cllscf /usr/es/sbin/cluster/utilites/cllsnw /usr/es/sbin/cluster/utilities/cltopinfo /usr/es/sbin/cluster/utilities/clshowres /usr/es/sbin/cluster/utilites/cllsserv Note: Also, don't forget to document the outputs of "lsvg", "lsvg -l", "lspv", "df -g", and the "/etc/filesystems", "/etc/hosts", and all other relevant configuration files. Ofcourse, this was only a small portion of all utilities.

1.10 Some notes on Disk based Heartbeat.

The whole idea about this, is to have additional Keep Alive, or Heartbeats, across a Non IP netw So, if the private network is just temporarily down, then a "Take Over" does not need to take pl An additional path, alongside the private network, makes sure that HACMP can determine that the even if the private network is malfunctioning for some reason.

To create this additional path, serial links might be used, or, you could use a "heartbeat over The latter provides the ability to use existing shared disks, to provide a "serial network like"

In HACMP 5.x, the RSCT component "/usr/sbin/rsct/bin/hats_diskhb_nim" has the functionality to m

There is no SCSI reservation on the disk. This is because both nodes must be able to read and wr For that, it is sufficient that the disk resides in an enhanced concurrent volume group to meet

There is ofcourse a difference between an concurrent Volume Group and an concurrent Resource Gro

- Nowadays, a concurrent Volume Group can be used in a nonconcurrent- and concurrent Resource Gr An imporatant feature of a concurrent Resource Group is, is that it's online and open on both

- With the older AIX and HACMP versions, a concurrent Volume Group can only be used in an concur

As of AIX 5.2, disk heartbeats can exist on an enhanced concurrent VG that resides in a non-conc How would one install Disk Heartbeat on a two node HACMP cluster:

Say you have the nodes "starboss" and "stargate". Suppose we use ESS storage with vpath devices. Starboss 'sees' the vpath4 device. Stargate 'sees' the vpath5 device.

If a PVID does not exist on each system, you should run "chdev -l <devicename> -a pv=yes" on bot This will ensure that smitty - CSPOC will reckognize it as a disk in shared storage.

Both vpaths (vpath4 and vpath5) are pointing to the same virtual disk. Let's now use C-SPOC to create an "Enhanced Concurrent volume group". # smitty cl_admin System Management (C-SPOC) Move cursor to desired item and press Enter. Manage HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP Log Viewing and Management HACMP File Collection Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management Configure GPFS

HACMP Concurrent Logical Volume Management Move cursor to desired item and press Enter. Concurrent Volume Groups Concurrent Logical Volumes Synchronize Concurrent LVM Mirrors

Concurrent Volume Groups Move cursor to desired item and press Enter. List All Concurrent Volume Groups Create a Concurrent Volume Group Create a Concurrent Volume Group with Data Path Devices Set Characteristics of a Concurrent Volume Group Import a Concurrent Volume Group Mirror a Concurrent Volume Group Unmirror a Concurrent Volume Group

Then, choose the nodes and after that add the appropriate shared storage devices based on pvids which here are vpath4 and vpath5. Then, choose that you want to create an Enhanced concurrent VG, with for example the name "examp A check on the disk devices, after the volume group was created, could be this: starboss #/ lspv vpath4 000a7f5pe78e9ed5 examplevg stargate #/lspv vpath5 000a7f5pe78e9ed5 examplevg

Now that the enhanced concurrent Volume Group is available, we now need to create the "heartbeat

Since the "physical" path is just along the Fibercards (or whatever physical connection to share you may wonder why its called a "network". Well, actually it resembles a heartbeat network so m call it a network too (it functions quite similar as the private IP network). A network of this type is called a "diskhb" network. To create it, use "smitty hacmp" from your primary node (for example starboss). Instead of showing all smitty menu's, here we will just only show the menu choices:

smitty hacmp > Extended Configuration > Extended Topology Configuration > Configure HACMP Networ Add a Network to the HACMP cluster > select diskhb > enter an appropriate network name Suppose we gave our new diskhb network the name "hbdisknet". When the above actions are done, we have added an diskhb network definition. Next we need to associate our new diskhb network, to our vpath devices.

smitty hacmp > Extended Configuration > Extended Topology Configuration > Configure HACMP Commun Add Communication Interfaces > Then, you need to fill in a screen similar to: Add a Communication Device Type or select values in entry fields. * * * * * Device Name Network Type Network Name Device Path Node Name [starboss_hb] diskhb hbdisknet [/dev/vpath4] [starboss]

When done, perform similar actions from your second node. To get a full functioning heartbeat newwork, you need to do some more work. This section was only provided to get a taste of a real installation of some HACMP component.

Well, that's it. Hopefully this document was of some use.

ster implementations.

-level overview on the main HACMP concepts and components. and will not directly focus on the

, of which the most well known are:

and parallel access of multiple Nodes on shared filesystem objects. h Failover of services (applications) to other Node(s).

Cluster Multi-Processing

public network

As of 5.1, 32 nodes are possible.

service ip address for an application, owned by node A, taken over by Node B

172.18.17.5

node B

/etc/hosts lists all ip adresses

private network
10.10.10.5

KA messages=heartbeats on: (1): private IP network (2): non IP heartbeat - Might be through rs232 like. connection (old method). - Or this type of heartbeat is "disk" based, using FC adapters to a disk in a concurrent VG.

rsct hacmp daemons: rscd clstrmgr clcomd grpglsmd hatsd logfiles hagsd - Resource group descriptions - start/stop scripts for the application - Resource Group takeover - Service IP address takeover

replication

hdisk1 hdisk2 hdisk3 etc.. SAN B

resource group RG2 application_02 volume groups filesystems

Here, two SAN's are present. Other setups might just use one shared disksubsystem. In this specific situation, each node has it's "own"

<------------> SCSI, SSA, or Fibre Channel.

online application and RG. Such an application and RG, can fail over to the other node.

IP etc..) are grouped together in Resource Groups (RGs),

Group "fails over" to another node. and it's IP parameters, the RG is a sort of "wrapper" or node. This is "high availability" of the application. e Node to the other Node, so it's probably better to say that ume Group, mount filesystems, aquire the Service IP address etc..).

ill "move" to the other Node. Stop and Start scripts at the first Node, and start (a bit later) on the other Node. s, are an element too in the HA environment.

l over" either by:

ce IP label by moving it to another

rface in addition to the base IP address. . Multiple aliases are possible, on different subnets. configured for an interface in AIX. n connect to the service (application). P address (IPAT via aliasing), or will replace , depending on the IP address take over (IPAT) mechanism.

could be realized using rs232 on both nodes (old method), ally configured disk in a conncurent VG. of the cluster nodes and their respective network interfaces. 32 or disk heartbeats still work, fail over will not take place. IP network only had a temporarily "hickup". a partitioned (or ill-functioning), cluster.

AIX 7.1 No No No No No No

Comment new heartbeating over disk

Yes Yes Yes Yes

P/ES is solely based on RSCT, short for 5, HACMP is renamed to PowerHA.

ource Controller (SRC), using the "lssrc" command, utables do not have the ES appended.

he status of the nodes and their interfaces, and invokes the appropriate scripts network events. It also centralizes the storage of and publishes updated information source groups. The Cluster Manager on each node coordinates information gathered from and other Cluster Managers in the cluster to maintain updated information about the content, f all HACMP resource groups. This information is updated and synchronized among all nodes rs that affects resource group configuration, status, or location. run the clstrmgr daemon. strmgr daemon is started via init process and should always be running. ncy on rhosts and r commands has been removed, and are done by clcomd. 5.2, clcomdES must be running before any cluster services can be started. tarted automatically at boot time by the init process. ote command execution and HACMP ODM configuration file updates.

status information about cluster objects. This daemon works in conjunction with agement Protocol (snmpd) daemon. All cluster nodes must run the clsmuxpd daemon. mon cannot be started unless the snmpd daemon is running.

luster Technology. Since HACMP 5.1, HACMP relies on RSCT. So, in modern HACMP, RSCT is ent or subsystem. For example, HACMP uses the heartbeat facility of RSCT. component in AIX5L.

ter Technology, or RSCT, is a set of software components that together provide a ng environment for AIX and Linux. RSCT is the infrastructure used by a variety ovide clusters with improved system availability, scalability, and ease of use.

nd Control (RMC)

are related to the RSCT framework. ces subsystem monitors the status of network interfaces. run the topsvcsd daemon. tsd and hats_nim.

ovides reliable communication and protocols required for cluster operation.

tes as a Group Services client; its function is to make switch adapter membership ter nodes. All cluster nodes must run the grpglsmd daemon

ts as a resource monitor for the event management subsystem and provides operating system characteristics and utilization. be running on each node in the cluster. By default the rmcd daemon is setup to start

s installed. The rc.cluster script ensures the RMC subsystem is running.

me-stamped, formatted messages generated by the HACMP for AIX scripts. mode, this log file contains a line-by-line record of each command executed pts, including the values of the arguments passed to the commands. By default, or AIX software writes verbose information to this log file; however, you can default. Verbose mode is recommended. me-stamped, formatted messages generated by HACMP for AIX scripts and daemons. file, there is one line written for the start of each event, and one line written

me-stamped, formatted messages from all AIX subsystems, including the HACMP ipts and daemons.

me-stamped, formatted messages generated by the HACMP for AIX scripts. creates a new cluster history log file every day that has a cluster event It identifies each day's file by the file name extension, where mm indicates nd dd indicates the day. me-stamped, formatted messages generated by HACMP for AIX clstrmgr activity. in this file is used by IBM Support personnel when the clstrmgr is in debug mode. his file is overwritten every time cluster services are started; uld be careful to make a copy of it before restarting cluster services on a failed node. me-stamped, formatted messages generated by HACMP for AIX C-SPOC commands. C-SPOC utility lets you start or stop the cluster from a single cluster node, poc.log is stored on the node that initiates a C-SPOC command.

ssages when the cluster verification has run. health check is performed at 00:00h, and logged on clutils.log

le. Only for later versions. uster snapshot facility (/usr/es/sbin/cluster/utilities/clsnapshots) allows you to save a record all the data that defines a particular cluster configuration. ows you to create your own custom snapshot methods, itional information important to your configuration. this snapshot for troubleshooting cluster problems.

ng the logs as described in section 1.4.

Services > Show Cluster Services

mons under the control of the SRC

connected to two separate storage systems. n), can be considered to be "owned", or be active for that node. including the volumes and filesystems, system (one SAN) as well.

node 2

in these simplistic figures, elements like SAN switches are left out

too large distance, otherwise additional components

nclude SCSI, SSA, ESS and others.

r node can access the shared data at a time. oup, but only one Node at the time, until fail-over occurs. ves to another node, the new node will activate the disks, volumes, and file systems.

naled file systemsas JFS2 to manage data, l volumes. These are thus not the regular filesystems. cluster filesystem like the GPFS filesystem, which,

e group is a volume group that you can vary online

ons with HACMP and HACMP/ES ps. Enhanced concurrent volume groups can be used

the regular filesystems which most applications use. to deal with the "concurrent" access mode. as GPFS, which allows for concurrent access, and gives ld use RAW volumes for true concurrent access.

odes in the resource group, and the data access rce group active, will vary on the will vary on the volume group in passive mode". on that volume group.

tions of the nodes to a storage system,

mote machine interprets this as a graceful down and

itself down gracefully. n and takes over resources. ut releasing any resources. Remote machine does not

hutdown script will stop the cluster services ver the resourses.

clstop" and "smitty clstart". rces need to failover. Then you would choose o applications are active, you can use

rboss]

ceful

it has shutdown in good order. the system, so that all filesystems are able to unmount.

that at the system boot, you can only ping the machine on

P, you need to start HACMP manually, for example using

rboss]

e se

our remote session to the machine using the boot ip adress ain, with a remote terminal, using the service IP address.

eed to start HACMP manually.

SPOC" utility, shown by the "System Management" menu option.

zes the environment required for HACMP/ES, and then calls art the HACMP daemons. t the specified subsystem or group. (SRC) facility, to start the cluster daemons. cripts called at cluster startup:

on any node (or on all nodes) in a cluster luster command on a single cluster node. mand to start cluster services on the nodes specified der, not in parallel. The output of the command

e. Because the command is executed remotely,

from "/etc/inittab".

pt might be called from inittab as well. The following record might be present in inittab:

dev/console 2>&1

arameter "-R", that inittab entry will be created. found in rc.cluster:

s the PowerHA, or HACMP, l stop the HACMP services. (which contain statements own version, will also call that

re the following records: /etc/inittab file to start the

o inittab for autostart the HACMP daemons at boottime. rameter, the entry will be added if it's not present already. dev/console 2>&1

Address Take over (IPAT), a dedicated script is implemented.

nt issues may present themselves at later times.

stions and answers as found in various AIX sites. LVM objects, in general, you should use cluster methods, on all involved nodes.

rrent LVM components that are SMIT HACMP C-SPOC, it executes the command de that has it varied on.

storage. SDD is installed.

in service) with:

or this because the commands

se C-SPOC through "smitty hacmp".

ote: I don't think you need to set the pvid) h is from this o/p-for e.g vpath0 )

urce group. What is the best way to add a (LUN) disk to it?

use HACMP C-SPOC in order to

Volume Management > Shared Volume Groups > me to a Shared Volume Group > <select your VG> > Select Disk

reen, then follow this: ady done by the SAN administrator)

(you don't need to set the pvid)

hen you goto add a new volume to a shared volume group.

ical Volume Management > Shared Volume Groups > me to a Shared Volume Group > <select your VG> ode 2 that the VG has the new lun associated.

nodes to see the lun, without one node

ACMP. What is different from just running "chfs"?

f them has access to the filesystems at a time. ot only this one machine but all machines in the cluster

rrent capable" VG's will be defined. ail-over will occur, you should not be bothered perform some actions manually.

lumegroup, you would simply use:

reserve the disks for the current machine. n access a volume group.

, which is called "concurrent capable".

the following are the most relevant ones.

current mode. This can only be done on a concurrent capable VG, HACMP environment. e disks that make up the volume group in an unlocked state.

ely accesses the VG, while the other node has the potential

e" to set the reservations. configured as "concurrent capable". ess configuration".

used, multiple nodes can truly access the VG environment, used in conjuction with HACMP.

the node, or varied on, in two ate varyons are done automatically by HACMP.

es the logical volumes normally available.

mount all filesystems, and all other usual

passive state, the LVM provides an

a failover configuration will open the VG

t access is possible using RAW volumes or

r". It has certain requirements of the applications " access, like Oracle RAC Clusterware can.

nd other information.

where they are active.

--------------------

--------------------

ommand. If fact, clfindres is a link to clRGinfo.

in HACMP, including the start and stop script

er with the specified name update the information

State:

On line

State:

On line

), the output of the below commands

", "lsvg -l", "lspv", "df -g", and the contents of configuration files.

ve, or Heartbeats, across a Non IP network. a "Take Over" does not need to take place. sure that HACMP can determine that the nodes are still up,

d, or, you could use a "heartbeat over disk". sks, to provide a "serial network like" heartbeat path.

_diskhb_nim" has the functionality to make it all happen.

both nodes must be able to read and write to that disk. hanced concurrent volume group to meet this requirement.

me Group and an concurrent Resource Group.

nconcurrent- and concurrent Resource Group. , is that it's online and open on both nodes.

ume Group can only be used in an concurrent Resource Group.

oncurrent VG that resides in a non-concurrent resource group.

we use ESS storage with vpath devices.

chdev -l <devicename> -a pv=yes" on both systems. s a disk in shared storage.

shared storage devices based on pvids

nt VG, with for example the name "examplevg".

reated, could be this:

e, we now need to create the "heartbeat" network.

r whatever physical connection to shared storage you may use), y it resembles a heartbeat network so much, that people private IP network).

for example starboss). nly show the menu choices:

Configuration > Configure HACMP Networks > r an appropriate network name

network definition. Configuration > Configure HACMP Communication Interfaces/Devices >

do some more work. nstallation of some HACMP component.

, 32 nodes are possible.

Here, two SAN's are present. Other setups might just use one shared disksubsystem. In this specific situation, each node has it's "own"

online application and RG. Such an application and RG, can fail over to the other node.

ut the content,