Sunteți pe pagina 1din 21

Day-To-Day activities on HACMP.

Overview This document contains Operational procedure for Day-To-Day activities respective to the HACMP. Contents: 1. Basics 2. HACMP Installation 3. HACMP Configuration 4. Disk Heartbeat 5. HACMP Startup/Stop 6. Resource Group Management 7. Application startup/stop scripts 8. HACMP Logical Volume Management 9. Cluster verification 10. User and Group Administration

Basics: Cluster Topology: The Nodes, networks, storage, clients, persistent node ip label/devices Cluster resources: HACMP can move these components from one node to others Ex: Service labels, File systems and applications HACMP Services: Cluster communication daemon (clcomdES) Cluster Manager (clstrmgrES) Cluster information daemon (clinfoES) Cluster locks manager (cllockd) Cluster SMUX peer daemon (clsmuxpd) HACMP Daemons: clstrmgr, clinfo, clmuxpd, cllockd.

HACMP installation: Smitty install_all fast path for installation Start the cluster communication daemon startsrc s clcomdES Upgrading the cluster options: node by node migration and snapshot conversion Steps for migration:

Stop cluster services on all nodes Upgrade the HACMP software on each node Start cluster services on one node at a time

Convert from supported version of HAS to hacmp


Current s/w should be commited Save snapshot Remove the old version Install HA 5.1 and verify

Check previous version of cluster: lslpp h cluster To save your HACMP configuration, create a snapshot in HACMP

Remove HACMP: smitty install_remove ( select software name cluster*) Lppchk v and lppchk c cluster* both commands run clean if the installation is ok. After you have installed HA on cluster nodes you need to convert and apply the snapshot. Converting the snapshot must be performed before rebooting the cluster nodes

Cluster Configuration:

All HACMP configuration is done through the smit menus. The rest of this section tells you what the configuration is. This part tells you how to do that configuration. Unless noted, this need only be done on one

server in the HACMP cluster HACMP copies everything to the other server.

smitty hacmp cluster configuration cluster topology configure cluster add a cluster definition cluster name : cl_mgmt configure nodes add cluster nodes enter the hostnames of the two nodes, separated by spaces configure adapters There are 8 adapters to configure in a standard implementation, so this screen must be completed 8 times once for each adapter. The

8 are : service, standby, boot and serial adapters for each of the servers, Note that

All labels must be in /etc/hosts before you do this step Adapter IP label must match the entry in /etc/hosts

network type is ether except for the serial adapters which are rs232. Network attribute is public for all adapters except the serial adapters which are serial Network name is ether1 for all adapters except the serial adapters which are serial1 Node name is required for all adapters Other fields can be left blank Show cluster topology Show cluster topology Check that this output looks like the cluster topology shown below. Synchronize cluster topology Run this with defaults. If it fails, check output and correct any errors (these may be errors in network or AIX as well as HACMP configuration). Cluster Resources

Define Resource Groups Add a resource group See resource group definitions and names below. The first three lines of the definition are defined in this panel. Define Application Servers Add an application server See below for application server configuration details. Change/Show Resources/Attributes for a resource group For the resource group fill in the attributes as shown below. Synchronise cluster resources Synchronize with the defaults. If it fails, check output and fix any problems.

Once your cluster has completed a resource synchronization with no errors (and you are happy with any warnings) you have completed the HACMP configuration. You may now start HACMP.

smit hacmp Cluster services Start cluster services

Disk Heartbeat: Disk heartbeating will typically requires 4 seeks/second. That is each of two nodes will write to the disk and read from the disk once/second.

Configuring disk heartbeat: Vpaths are configured as member disks of an enhanced concurrent volume group. Smitty lvmselect volume groupsAdd a volume groupGive VG name, PV names, VG major number, Set create VG concurrent capable to enhanced concurrent. Import the new VG on all nodes using smitty importvg or importvg V 53 y c23vg vpath5 Create the diskhb networksmitty hacmpextended configuration extended topology configurationconfigure hacmp networksAdd a network to the HACMP clusterchoose diskhb

Add 2 communication devices smitty hacmpextended configuration extended topology configurationConfigure HACMP communication Interfaces/DevicesAdd communication interfaces/devicesAdd pre-defined communication interfaces and devices communication deviceschoose the diskhb Create one communication device for other node also

Testing Disk Heartbeat connectivity:/usr/sbin/rsct/dhb_read is used to test the validity of a diskhb connection. Dhb_read p vpath0 r for receives data over diskhb network Dhb_read p vpath3 t for transmits data over diskhb network. Monitoring disk heartbeat: Monitor the activity of the disk heartbeats via lssrc ls topsvcs.

Cluster Startup/Stop: Cluster Startup : smit cl_admin Manage HACMP Services > Start ClusterServices Note: Monitor with /tmp/hacmp.out and check for node_up_complete.

Stop the cluster : smitty cl_admin HACMP Services > Stop ClusterServices Note: Monitor with /tmp/hacmp.out and check fr node_down_complete.

Resource Group Management: Resource group takeover relationship: 1. 2. 3. 4. Cascading Rotating Concurrent Custom

Cascading:

Cascading resource group is activated on its home node by default. Resource group can be activated on low priority node if the highest priority node is not available at cluster startup. If node failure resource group falls over to the available node with the next priority. Upon node reintegration into the cluster, a cascading resource group falls back to its home node by default. Attributes:

1. Inactive takeover(IT): Initial acquisition of a resource group in case the home node is not available. 2. Fallover priority can be configured in default node priority list. 3. cascading without fallback is an attribute that modifies the fall back behavior. If cwof flag is set to true, the resource group will not fall back to any node joining. When the flag is false the resource group falls back to the higher priority node. Rotating: At cluster startup first available node in the node priority list will activate the resource group. If the resource group is on the takeover node. It will never fallback to a higher priority node if one becomes available. Rotating resource groups require the use of IP address takeover. The nodes in the resource chain must all share the same network connection to the resource group. Concurrent:

A concurrent RG can be active on multiple nodes at the same

time. Custom:

Users have to explicitly specify the desired startup, fallover and fallback procedures. This support only IPAT via aliasing service IP addresses.

Startup Options:

Online on home node only Online on first available node Online on all available nodes Online using distribution policyThe resource group will only be brought online if the node has no other resource group online. You can find this by lssrc ls clstrmgrES

Fallover Options:

Fallover to next priority node in list Fallover using dynamic node priorityThe fallover node can be selected on the basis of either its available CPU, its available memory or the lowest disk usage. HACMP uses RSCT to gather all this information then the resource group will fallover to the node that best meets. Bring offlineThe resource group will be brought offline in the event of an error occur. This option is designed for resource groups that are online on all available nodes.

Fallback Options:

Fallback to higher priority node in the list Never fallback

Resource group Operation:

Bring a resource group offline: smitty cl_adminselect hacmp resource group and application managementBring a resource group offline. Bring a resource group online: smitty hacmp select hacmp resource group and application managementBring a resource

group online. Move a resource group: smitty hacmp select hacmp resource group and application management Move a resource group to another node To find the resource group information: clrginfo P Resource group states: online, offline, aquiring, releasing, error, temporary error, or unknown. Application Startup/Stop Scripts: smitty hacmp cluster configuration Cluster Resources Define Application Servers Add an application server Configure HACMP Application Monitoring: smitty cm_cfg_appmonAdd a process application monitorgive process names, app startup/stop scripts

HACMP Logical Volume Management: C-SPOC LVM: smitty cl_admin HACMP Logical Volume Management Shared Volume groups Shared Logical volumes Shared File systems Synchronize shared LVM mirrors (Synchronize by VG/Synchronize by LV) Synchronize a shared VG definition C-SPOC concurrent LVM: smitty cl_admin HACMP concurrent LVM Concurrent volume groups Concurrent Logical volumes Synchronize concurrent LVM mirrors C-SPOC Physical volume management: smitty cl_adminHACMP physical volume management Add a disk to the cluster Remove a disk from the cluster Cluster disk replacement

Cluster datapath device management

Cluster Verification: smitty hacmpExtended verificationExtended verification and synchronization. Verification log files stored in /var/hacmp/clverify. /var/hacmp/clverify/clverify.log Verification log

/var/hacmp/clverify/pass/nodename If verification succeeds /var/hacmp/clverify/fail/nodename If verification fails Automatic cluster verification: Each time you start cluster services and every 24 hours. Configure automatic cluster verification: smitty hacmpproblem determination toolshacmp verification Automatic cluster configuration monitoring.

User and group Administration: Smitty cl_usergroupusers in a HACMP cluster Add a user to the cluster List users in the cluster Change/show characteristics of a user in the cluster Remove a user from the cluster Smitty cl_usergroupGroups in a HACMP cluster Add a group to the cluster List groups to the cluster Change a group in the cluster Remove a group Smitty cl_usergroupPasswords in an HACMP cluster

FAQS Does HACMP work on different operating systems?


Yes. HACMP is tightly integrated with the AIX 5L operating system and System p servers allowing for a rich set of features which are not available with any other combination of operating system and hardware. HACMP V5 introduces support for the Linux operating system on POWER servers. HACMP for Linux supports a subset of the features available on AIX 5L, however this mutli-platform support provides a common availability infrastructure for your entire enterprise.

What applications work with HACMP?


All popular applications work with HACMP including DB2, Oracle, SAP, WebSphere, etc. HACMP provides Smart Assist agents to let you quickly and easily configure HACMP with specific applications. HACMP includes flexible configuration parameters that let you easily set it up for just about any application there is.

Does HACMP support dynamic LPAR, CUoD, On/Off CoD, or CBU?


HACMP supports Dynamic Logical Partitioning, Capacity Upgrade on Demand, On/Off Capacity on Demand and Capacity Backup Upgrade.

If a server has LPAR capability, can two or more LPARs be configured with unique instances of HACMP running on them without incurring additional license charges?
Yes. HACMP is a server product that has one charge unit: number of processors on which HACMP will be installed or run. Regardless of how many LPARs or instances of AIX 5L that run in the server, you are charged based on the number of active processors in the server that is running HACMP. Note that HACMP configurations containing multipleLPARs within a single server may represent a potential single point-of-failure. To avoid this, it is recommended that the backup for an LPAR be an LPAR on a different server or a standalone server.

Does HACMP support non-IBM hardware or operating systems?


Yes. HACMP for AIX 5L supports the hardware and operating systems as specified in the manual where HACMP V5.4includes support for Red Hat and SUSE Linux. Paging space and paging rates

HACMP interview questions


a. What characters should a hostname contain for HACMP configuration? The hostname cannot have following characters: -, _, * or other special characters. b. Can Service IP and Boot IP be in same subnet? No. The service IP address and Boot IP address cannot be in same subnet. This is the basic requirement for HACMP cluster configuration. The verification process does not allow the IP addresses to be in same subnet and cluster will not start. c. Can multiple Service IP addresses be configured on single Ethernet cards? Yes. Using SMIT menu, it can be configured to have multiple Service IP addresses running on single Ethernet card. It only requires selecting same network name for specific Service IP addresses in SMIT menu. d. What happens when a NIC having Service IP goes down? When a NIC card running the Service IP address goes down, the HACMP detects the failure and fails over the service IP address to available standby NIC on same node or to another node in the cluster. e. Can Multiple Oracle Database instances be configured on single node of HACMP cluster? Yes. Multiple Database instances can be configured on single node of HACMP cluster. For this one needs to have separate Service IP addresses over which the listeners for every Oracle Database will run. Hence one can have separate Resource groups which will own each Oracle instance. This configuration will be useful if there is a failure of single Oracle Database instance on one node to be failed over to another node without disturbing other running Oracle instances. f. Can HACMP be configured in Active - Passive configuration? Yes. For Active - In Passive cluster configuration, do not configure any Service IP on the passive node. Also for all the resource groups on the Active node please specify the passive node as the next node in the priority to take over in the event of failure of active node. g. Can file system mounted over NFS protocol be used for Disk Heartbeat? No. The Volume mounted over NFS protocol is a file system for AIX, and since disk

device is required for Enhanced concurrent capable volume group for disk heartbeat the NFS file system cannot be used for configuring the disk heartbeat. One needs to provide disk device to AIX hosts over FCP or iSCSI protocol. h. Which are the HACMP log files available for troubleshooting? Following are log files which can be used for troubleshooting: 1. /var/hacmp/clverify/current//* contains logs from current execution of cluster verification. 2. /var/hacmp/clverify/pass//* contains logs from the last time verification passed. 3. /var/hacmp/clverify/fail//* contains logs from the last time verification failed. 4. /tmp/hacmp.out file records the output generated by the event scripts of HACMP as they execute. 5. /tmp/clstmgr.debug file contains time-stamped messages generated by HACMP clstrmgrES activity. 6. /tmp/cspoc.log file contains messages generated by HACMP C-SPOC commands. 7. /usr/es/adm/cluster.log file is the main HACMP log file. HACMP error messages and messages about HACMP related events are appended to this log. 8. /var/adm/clavan.log file keeps track of when each application that is managed by HACMP is started or stopped and when the node stops on which an application is running. 9. /var/hacmp/clcomd/clcomd.log file contains messages generated by HACMP cluster communication daemon. 10. /var/ha/log/grpsvcs. file tracks the execution of internal activities of the grpsvcs daemon. 11. /var/ha/log/topsvcs. file tracks the execution of internal activities of the topsvcs daemon. 12. /var/ha/log/grpglsm file tracks the execution of internal activities of grpglsm daemon.

Key PowerHA terms

The following terms are used throughout this article and are helpful to know when discussing PowerHA:

Cluster: A logical grouping of servers running PowerHA. Node: An individual server within a cluster. Network: Although normally this term would refer to a larger area of computer-to-computer communication (such as a WAN), in PowerHA network refers to a logical definition of an area for communication between two servers. Within PowerHA, even SAN resources can be defined as a network. Boot IP: This is a default IP address a node uses when it is first activated and becomes available. Typicallyand as used in this articlethe boot IP is a non-routable IP address set up on an isolated VLAN accessible to all nodes in the cluster.

Persistent IP: This is an IP address a node uses as its regular means of communication. Typically, this is the IP through which systems administrators access a node.

Service IP: This is an IP address that can "float" between the nodes. Typically, this is the IP address through which users access resources in the cluster. Application server: This is a logical configuration to tell PowerHA how to manage applications, including starting and stopping applications, application monitoring, and application tunables. This article focuses only on starting and stopping an application.

Shared volume group: This is a PowerHA-managed volume group. Instead of configuring LVM structures like volume groups, logical volumes, and file systems through the operating system, you must use PowerHA for disk resources that will be shared between the servers. Resource group: This is a logical grouping of service IP addresses, application servers, and shared volume groups that the nodes in the cluster can manage. Failover: This is a condition in which resource groups are moved from one node to another. Failover can occur when a systems administrator instructs the nodes in the cluster to do so or when circumstances like a catastrophic application or server failure forces the resource groups to move.

Failback/fallback: This is the action of moving back resource groups to the nodes on which they were originally running after a failover has occurred. Heartbeat: This is a signal transmitted over PowerHA networks to check and confirm resource availability. If the heartbeat is interrupted, the cluster may initiate a failover depending on the configuration.

S-ar putea să vă placă și