Sunteți pe pagina 1din 383

We request that you please turn off

pagers and cell phones during class.

Thank you.
VERITAS Cluster Server
for Solaris

Lesson 1
VCS Terms and Concepts
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Services Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-3
Objectives
After completing this lesson, you will be able to:
Define VCS terminology.
Describe cluster communication basics.
Describe VERITAS Cluster Server architecture.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-4


Clusters
Local Area Network

Fibre Switches
Fibre Switches

SCSI JBODS

Several networked systems


Shared storage
Single administrative entity
Peer monitoring
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-5
Systems

Members of a cluster
Referred to as nodes
Contain copies of:
• Communication protocol configuration files
• VCS configuration files
• VCS libraries and directories
• VCS scripts and daemons
Share a single dynamic cluster
configuration
Provide application services

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-6


Service Groups

A service group is a related collection of


resources.
Resources in a service group must be
available to the system.
Resources and service groups have
interdependencies.
IP

Share NIC

NFS Service Group NFS


Mount

Disk

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-7


Service Group Types

Failover
• Can be partially or fully online on only one
server at a time
• VCS controls stopping and restarting the
service group when components fail
Parallel
• Can be partially or fully online on multiple
servers simultaneously
• Examples:
– Oracle Parallel Server
– Web, FTP servers

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-8


Resources

VCS objects that correspond to hardware


or software components
Monitored and controlled by VCS
Classified by type
Identified by unique names and attributes
Can depend on other resources within the
same service group

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-9


Resource Types

General description of the attributes of a


resource
Example Mount resource type attributes:
• MountPoint
• BlockDevice
Other example resource types:
• Disk
• Share
• IP
• NIC

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-10


Agents
Processes that control resources
One agent per resource type
Agent controls all resources of that type.
Agents can be added into VCS agent
framework.

Resources
Resources /data c1t0d0s0 c1t0d1s0 hme0 qfe1 10.1.2.4

Agents
Agents Mount Disk NIC IP

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-11


Dependencies
Resources can depend on
other resources.
Mount
Parent resources depend on (Parent)
child resources.
Service groups can depend on
other service groups.
Resource types can depend on
other resource types. Disk
(Child)
Rules govern service group and
resource dependencies.
No cyclic dependencies are
allowed.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-12
Private Network
Minimum two communication channels with
separate infrastructure:
• Multiple NICs (not just ports)
• Separate hubs, if used
Heartbeat communication determines which
systems are members of the cluster.
Cluster configuration broadcast updates
cluster systems with status of each resource
and service group.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-13


Low Latency Transport (LLT)
Provides fast, kernel-to-kernel communications
Is connection oriented
Is not routable
Uses Data Link Provider Interface (DLPI) over
Ethernet

Kernel
Kernel LLT LLT

Hardware
Hardware Private
Private Network
Network
SystemA SystemB

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-14


Group Membership
Services/Atomic Broadcast (GAB)
Manages cluster membership
Maintains cluster state
Uses broadcasts
Runs in kernel over Low Latency Transport
(LLT)

GAB GAB
Kernel
Kernel
LLT LLT

Hardware
Hardware Private
Private Network
Network
SystemA SystemB

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-15


VCS Engine (had)

Maintains configuration and state information


for all cluster resources
Uses GAB to communicate among cluster
systems
Is monitored by hashadow process

hashadow had hashadow had

GAB GAB
Kernel
Kernel
LLT LLT

Hardware
Hardware Private
Private Network
Network
SystemA SystemB
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-16
VCS Architecture
Shared Cluster Configuration in Memory

Resources
Resources /v c1d0t0s0 hme0 10.1.2.4 /v c1d0t0s0 hme0 10.1.2.4

Agents
Agents Mount Disk NIC IP Mount Disk NIC IP

hashadow had hashadow had

GAB GAB
Kernel
Kernel
LLT LLT

Hardware
Hardware
SystemA SystemB
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-17
Summary
You should now be able to:
Define VCS terminology.
Describe cluster communication basics.
Describe VERITAS Cluster Server architecture.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-18


VERITAS Cluster Server
for Solaris

Lesson 2
Installing VERITAS Cluster Server
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-20
Objectives
After completing this lesson, you will be able to:
Describe VCS software, hardware, and
licensing prerequisites.
Describe the general VCS hardware
requirements.
Configure SCSI controllers for a shared disk
storage environment.
Add VCS executable and manual page paths to
the environment variables.
Install VCS using the installation script.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-21


Software and Hardware
Requirements
Software:
• Solaris 2.6, 7 and 8 (32-bit and 64-bit)
• Recommended:
– Solaris patches
– VERITAS Volume Manager (VxVM) 3.1.P1+
– VERITAS File System (VxFS) 3.3.1+
Hardware:
• Check latest VCS release notes.
• Contact VERITAS Support.
Licenses:
• Keys are required on a per-system or per-site basis.
• Contact VERITAS Sales for new license, or VERITAS
Support for upgrades.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-22


General Hardware Layout
Private Ethernet Heartbeat Links

OS Disk

NICS NICS

SCSI1
OS SCSI2

Shared Data
Disks
SCSI2
NICS NICS
SCSI1
Public Network
SYSTEM A SYSTEM B

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-23


SCSI Controller Configuration

OS Disk
scsi-initiator-id SCSI Target scsi-initiator-id 0
IDs:
5 7
1
OS 2 SCSI1
SCSI2
Disk 3
5 7
Shared Data
4
Disks
0 SCSI2

SCSI1
SYSTEM A SYSTEM B

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-24


SCSI Controller Setup

Use unique SCSI IDs for each system.


Check the scsi-initiator-id setting
using the eeprom command.
Change the scsi-initiator-id if needed.
Controller ID can also be changed on a
controller-by-controller basis.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-25


Setting Environment Variables
For Bourne or Korn shell (sh or ksh):
• PATH
PATH=$PATH:/sbin:/opt/VRTSvcs/bin:/opt/VRTSllt
export PATH
• MANPATH
MANPATH=$MANPATH:/opt/VRTS/man
export MANPATH
• Add to /.profile
For C shell (csh or tcsh):
• PATH
setenv PATH \
${PATH}:/sbin:/opt/VRTSvcs/bin:/opt/VRTSllt
• MANPATH
setenv MANPATH ${MANPATH}:/opt/VRTS/man

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-26


The installvcs Utility
Uses pkgadd to install the VCS packages on all the
systems in the cluster:
• VRTSllt
• VRTSgab
• VRTSperl
• VRTSvcs
• VRTSweb
• VRTSvcsw
• VRTSvcsdc
Requires remote root access to other systems in the
cluster while the script is being run (/.rhosts file)
Note: Can remove .rhosts files after VCS installation.
Configures two private network links for VCS
communications
Brings the cluster up without any services
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-27
Installation Settings
Information required by installvcs:
Cluster name
Cluster number
System names
License key
Network ports for private network
Web Console configuration:
• Virtual IP address
• Subnet mask
• Network interface
SMTP/SNMP notification configuration
(discussed later)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-28
Starting VCS Installation
# ./installvcs

Please enter the unique Cluster Name :
mycluster
Please enter the unique Cluster ID(a number
from 0-255) : 200
Enter the systems on which you want to install.
(system names separated by spaces) : train7
train8
Analyzing the system for install.
……
Enter the license key for train7 :
XXXX XXXX …
Applying the license key to all systems in the
cluster …
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-29
Installing the Private Network
Following is the list of discovered NICs:
Sr. No. NIC Device
1. /dev/hme:0
2. /dev/qfe:0
3. /dev/qfe:1
4. /dev/qfe:2
5. /dev/qfe:3
6. Other
From the list above, please enter the serial number
(the number appearing in the Sr. No. column) of
the NIC for
First PRIVATE network link: 1
From the list above, please enter the serial number
(the number appearing in the Sr. No. column) of
the NIC for
Second PRIVATE network link: 2
Do you have the same network cards set up on all
systems (Y/N)? y

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-30
Configuring the Web Console
Do you want to configure the Cluster Manager (Web
Console) (Y/N)[Y] ? y
Enter the Virtual IP address for the Web Server :
192.168.27.9
Enter Subnet [255.255.255.0]: <enter>
Enter the NIC Device for this Virtual IP address
(public network) on train7 [hme0]: <enter>
Do you have the same NIC Device on all other systems
(Y/ N)[Y] ? y
Do you want to configure SNMP and/or SMTP (e-mail)
notification (Y/N)[Y] ? n
Summary information for ClusterService Group setup :
--------------------------------------------------
Cluster Manager (Web Console) :
Virtual IP Address : 192.168.27.9
Subnet : 255.255.255.0
Public Network link :
train7 train8 : hme0
URL to access : http://192.168.27.9:8181/vcs
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-31
Completing VCS Installation
Installing on train7.
Copying VRTSperl binaries.
.....
Installing on train8.
Copying VRTSperl binaries.
....
Copying Cluster configuration files... Done.
Installation successful on all systems.
Installation can start the Cluster components on the
following system/s.
train7 train8
Do you want to start these Cluster components now
(Y/N)[Y] ? y
Loading GAB and LLT modules and starting VCS on
train7:
Starting LLT...Start GAB....Start VCS
Loading GAB and LLT modules and starting VCS on
trainer2:
Starting LLT...Start GAB....Start VCS
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-32
Summary
You should now be able to:
Describe VCS software, hardware, and
licensing prerequisites.
Describe the general VCS hardware
requirements.
Configure SCSI controllers for a shared disk
storage environment.
Add VCS executable and manual page paths to
the environment variables.
Install VCS using the installation script.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-33


Lab 2: Installing VCS

OS Disk
scsi-initiator-id SCSI Target scsi-initiator-id 0
Ids:
5 7
1
2 SCSI1
SCSI1 SCSI2
3
5 7
Shared Data
4
Disks
0 SCSI2

OS
Disk train1 train2

# ./installvcs
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-34
VERITAS Cluster Server
for Solaris

Lesson 3
Managing Cluster Services
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Services Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-36
Objectives
After completing this lesson, you will be able to:
Describe the cluster configuration mechanisms
Start the VCS engine on cluster systems.
Stop the VCS engine.
Modify the cluster configuration.
Describe cluster transition states.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-37


Cluster Configuration

Shared Cluster Configuration in Memory

had had
hashadow hashadow
main.cf main.cf

GAB GAB

LLT LLT

SystemA SystemB
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-38
Starting VCS
1
System1 System2 System3

Cluster No valid
Conf configuration

2 main.cf 6

had had
1 hashadow hashadow
5

hastart 3 hastart
7
Private Network

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-39


Starting VCS: Second System
1
System1 System2 System3

Cluster
Cluster Conf
Conf

10
main.cf main.cf
had 9 had
hashadow hashadow

8 Private Network

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-40


Starting VCS: Third System

System1 System2 System3

Shared Cluster Configuration in Memory

main.cf main.cf main.cf


had had had
hashadow hashadow hashadow

Private Network

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-41


Stopping VCS
System1 System2

SGA SGB
System1 System2
had had
SGA
1 hastop -local SGA
SGB
System1 System2
had had

2 hastop -local -evacuate


SGB

had had
3 hastop -local -force
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-42
The hastop Command

The hastop command stops the VCS engine.


Syntax:
hastop –option [arg] [-option]
Options:
• -local [-force | -evacuate]
• -sys sys_name [-force | -evacuate]
• -all [-force]
Example:
hastop -sys train4 -evacuate

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-43


Displaying Cluster Status
The hastatus Command Displays status of
items in the cluster.
Syntax:
hastatus -option [arg] [-option arg]
Options:
• -group service_group
• -sum[mary]
Example:
hastatus -group OracleSG

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-44


Protecting the Cluster
1
Configuration
2

Cluster
Conf 3

main.cf main.cf main.cf


OOO
.stale .stale

1
haconf -makerw hares –add … haconf –dump makero

1. Cluster configuration opened; .stale file created


2. Resources added to cluster configuration in memory;
main.cf out of sync with memory configuration
3. Changes saved to disk; .stale removed
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-45
Opening and Saving the Cluster
Configuration
The haconf command opens, closes, and
saves the cluster configuration.
Syntax:
haconf –option [-option]
Options:
• -makerw Opens configuration
• -dump Saves configuration
• -dump –makero Saves and closes
configuration
Example:
haconf -dump -makero
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-46
Starting VCS with a Stale
1
Configuration
System1 System2 System3

Cluster Cluster
Conf 4 Conf
5

main.cf main.cf main.cf


2 .stale
had had
hashadow hashadow
1

hastart
3
Private Network

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-47


Forcing VCS to Start on the
1
Local System
System1 System2 System3

Cluster
Conf
4

main.cf main.cf main.cf

2 .stale
had
hashadow
1

hastart -force
3
Private Network

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-48


Forcing a System to Start
1
System1 System2 System3

Cluster
Conf
1

main.cf main.cf main.cf

.stale .stale .stale


had had had
hashadow hashadow hashadow

2
Private Network
hasys –force System2
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-49
The hasys Command

Alters or queries state of had


Syntax:
hasys –option [arg]
Options:
• -force system_name
• -list
• -display system_name
• -delete system_name
• -add system_name
Example:
hasys -force train11
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-50
Propagating a Specific
Configuration
1. Stop VCS on all systems in the cluster and
leave applications running:
hastop -all -force
2. Start VCS stale on all other systems:
hastart -stale
The -stale option causes these systems
to wait until a running configuration is
available from which they can build.
3. Start VCS on the system with the main.cf
that you are propagating:
hastart
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-51
Summary of Start Options
The hastart command starts the had and
hashadow daemons.
Syntax:
hastart [-option]
Options:
• -stale
• -force
Example:
hastart -force

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-52


Validating the Cluster
Configuration
The hacf utility checks the syntax of the
main.cf file.
Syntax:
hacf -verify config_directory
Example:
hacf -verify /etc/VRTSvcs/conf/config

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-53


Modifying Cluster Attributes
The haclus command is used to view and
change cluster attributes.
Syntax:
haclus –option [arg]
Options:
• -display
• -help [-modify]
• -modify modify_options
• -value attribute
• -notes
Example:
haclus –value ClusterLocation
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-54
Startup States and Transitions
UNKNOWN
hastart
INITING
Valid configuration on disk Stale configuration on disk
CURRENT_DISCOVER_WAIT STALE_DISCOVER_WAIT
Peer in Peer in Peer in Peer in Peer in
ADMIN_WAIT LOCAL_BUILD RUNNING LOCAL_BUILD ADMIN_WAIT

ADMIN_WAIT CURRENT_PEER_WAIT STALE_ADMIN_WAIT ADMIN_WAIT


Peer in Peer starts
No Peer
RUNNING LOCAL_BUILD
LOCAL_BUILD STALE_PEER_WAIT
Disk Peer in
Error RUNNING
REMOTE_BUILD

The only peer in


RUNNING RUNNING state crashes
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-55
Shutdown States and
Transitions
RUNNING
Running config. Unexpected
lost exit hastop hastop -force

ADMIN_WAIT FAULTED LEAVING EXITING_FORCIBLY

Resources offlined,
agents stopped
EXITING

EXITED

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-56


Summary

You should now be able to:


Describe the cluster configuration
mechanisms.
Start VCS.
Stop VCS.
Modify the cluster configuration.
Explain the transition states of the cluster.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-57


Lab 3: Managing Cluster
Services
To complete this lab exercise:
Use commands to start and stop cluster
services, as described in the detailed lab
instructions.
Observe the cluster status by running
hastatus in a terminal window.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-58


VERITAS Cluster Server
for Solaris

Lesson 4
Using the Cluster Manager Graphical User
Interface
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Services Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-60
Objectives
After completing this lesson, you will be able to:
Install Cluster Manager.
Control access to VCS administration.
Demonstrate Cluster Manager features.
Create a service group.
Create resources.
Manage resources and service groups.
Use the Web Console to administer VCS.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-61


Installing Cluster Manager
Cluster Manager requirements on Solaris:
• 128 MB RAM
• 1280 x 1024 display resolution
• Minimum 8-bit color depth of the monitor;
24-bit is recommended
To install Cluster Manager:
pkgadd –d pkg_location VRTScscm

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-62


Cluster Manager Properties

Can be run from a remote system:


• Windows NT
• Solaris system (cluster member or
nonmember)
Can manage multiple clusters from a single
workstation
Uses TCP port 14141 by default; change
with such an entry in /etc/services, if
desired:
vcs 12345/tcp

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-63


Controlling Access to VCS:
User Accounts
Cluster Administrator
Full privileges
Cluster Operator
All cluster, service group, and resource-level
operations
Cluster Guest
Read-only access; new users created as Cluster
Guest accounts by default.
Group Administrator
All service group operations for a specified service
group, except deleting service groups
Group Operator
Online and offline service groups and resources;
temporarily freeze or unfreeze service groups

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-64


VCS User Account Hierarchy

Cluster Administrator

Includes privileges for


Cluster Operator
Group Administrator Includes privileges for

Includes privileges for

Group Operator

Includes privileges for

Cluster Guest

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-65


Adding Users and Setting
Privileges
Cluster configuration must be open.
Users are added using the hauser
command.
hauser –add username

Additional privileges can then be added:


haclus -modify Administrators -add user
haclus -modify Operators -add user
hagrp -modify group Administrators -add user
hagrp -modify group Operators -add user

VCS user account admin is created with


Cluster Administrator privilege by
vcsinstall utility.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-66
Modifying User Accounts
To display account information:
hauser -display user_name
To change a password:
hauser -update user_name
To delete a VCS user account:
hauser -delete user_name

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-67


Controlling Access to the VCS
Command Line Interface
No mapping between UNIX and VCS user
accounts by default except root, which has
Cluster Administrator privilege.
Nonroot users are prompted for a VCS
account name and password when
executing VCS commands using the
command line interface.
The cluster attribute AllowNativeCliUsers
can be set to map UNIX account names to
VCS accounts.
A VCS account must exist with the same
name as the UNIX user with appropriate
privileges.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-68


Cluster Manager Demonstration
Cluster Manager demonstration:
• Configuration and logging on
• Creating a service group and a resource
• Manual and automatic fail over
• Log desk, Command Log, Command
Center, and Cluster Shell
Refer to your participants guide as the steps
are listed in the notes.
If unable to demonstrate in class, the
following slides guide you through the
demonstration.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-69


Configuring Cluster Manager

1 hagui&

3
4 5
6

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-70


Logging In to Cluster Manager

Cluster
Panel Service
1 Groups
3
2 4
Member Heartbeats
Systems

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-71


VCS Cluster Explorer

1
4

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-72


Creating a Service Group

3
4

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-73


Creating a Resource

4
5
6

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-74


Bringing a Resource Online

1
2 3

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-75


Resource and
Service Group Status

2
4
3

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-76


Switching the Service Group to
Another System

2 3

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-77


Service Group Switched

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-78


Changing MonitorInterval

3
1

2
4

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-79


Setting the Critical Attribute

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-80


Faulted Resources

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-81


Clearing a Faulted Resource

2 3

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-82


Log Desk

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-83


Command Log

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-84


Command Center

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-85


Shell Tool

2
5

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-86


Administering User Profiles

Add user account.

Remove or modify user account.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-87


Using the Web Console
Web Console Java Console
Manage existing Configure service
resources and service groups and resources:
groups:
• Add
• Online, offline
• Clearing faults and • Delete
probing resources • Modify
• Switching, flushing,
freezing service Can be used for all
groups VCS administrative
tasks
Cannot be used to
create resources or Requires Cluster
service groups Manager and Java to
Runs on any system be installed on the
with a Java-enabled administration system
Web browser
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-88
Connecting to the Web Console

http://IP_alias:8181/vcs

VCS account and password

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-89


Cluster Summary

Display Refresh Navigation buttons

Log entries

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-90


System View

Selected View
Navigation trail

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-91


Summary
You should now be able to:
Install Cluster Manager.
Control access to VCS administration.
Demonstrate Cluster Manager features.
Create a service group.
Create resources.
Manage resources and service groups.
Use the Web Console to administer VCS.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-92


Lab 4: Using Cluster Manager

Student Red Student Blue

RedGuiSG BlueGuiSG

RedFile BlueFile

/tmp/RedFile /tmp/BlueFile

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-93


VERITAS Cluster Server
for Solaris

Lesson 5
Service Group Basics
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-95
Objectives
After completing this lesson, you will be able to:
Describe how application services relate to
service groups.
Translate application requirements to service
group resources.
Define common service group attributes.
Create a service group using the command line
interface.
Perform basic service group operations.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-96


Application Service
Database Requests

Database
Software

NIC

IP Address

Data

Log

Network
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-97
High Availability Applications

VCS must be able to perform these


operations:
Start using a defined startup procedure.
Stop using a defined shutdown procedure.
Monitor using a defined procedure.
Share storage with other systems and store
data to disk, rather than maintaining it in
memory.
Restart to a known state.
Migrate to other systems.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-98


Example Service Groups

Web Database
SystemA

Parallel Service Group Failover Service Group

SystemB Web Database

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-99


Analyzing Applications

1. Specify application services


corresponding to service groups.
2. Determine high availability level and
service group type, failover or parallel.
3. Specify which systems run which services
and the desired failover policy.
4. Identify the hardware and software objects
required for each service group and their
dependencies.
5. Map the service group resources to actual
hardware and software objects.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-100


Example Application Services
Database Application Services

Service Groups Database


processes
/oracle/data
/oracle/log
c1t1d0s5
c1t2d0s4
192.168.3.55
Web qfe1

httpd
/data
c1t3d0s3
192.168.3.56
qfe1

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-101


Identify Physical Resources

Database Service Group

Database
Application

File System File System IP Address


/oracle/data /oracle/log 192.168.3.55
Contains data Contains log
files(s) file(s)
Network Port
qfe1
Physical Disk 1 Physical Disk 2
c1t1d0s5 c1t2d0s4

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-102


Map Physical Objects to
VCS Resources
The database service group in the example
requires:
Two Disk resources to monitor the availability
of the shared log disk and the shared data disk
Two Mount resources that mount, unmount,
and monitor the required log and data file
systems
A NIC resource to check the network
connectivity on port qfe1
An IP resource to configure the IP address that
will be used by database clients to access the
database
An Oracle resource to start, stop, and monitor
the Oracle database application

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-103


Service Groups

Create a service group using the command


line interface:
• Syntax:
hagrp -add group_name
• Example:
hagrp –add mySG
Modify service group attributes to define
behavior:
hagrp –modify group_name attribute \
value [values]

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-104


SystemList Attribute

Defines the systems that can run the


service group
Lowest numbered system has highest
priority in determining the target system for
failover.
To define SystemList attribute:
• Syntax:
hagrp –modify group_name SystemList \
system1 priority1 system2 priority2 …
• Example:
hagrp –modify mySG SystemList \
train1 0 train2 1

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-105


AutoStart and AutoStartList
Attributes
A service group is automatically started on a
system when VCS is started (if it is not already
online somewhere else in the cluster) under the
following conditions:
• The AutoStart attribute is set to 1.
• The system is listed in its AutoStartList attribute.
• The system is listed in its SystemList attribute.
To define AutoStart attribute (default is 1):
hagrp –modify group_name AutoStart value
To define AutoStartList attribute:
hagrp –modify group_name AutoStartList \
system1 system2 …
Examples:
hagrp –modify myManualSG AutoStart 0
hagrp –modify mySG AutoStartList train0

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-106


AutoStartIfPartial Attribute
Allows VCS to bring a service group with
disabled resources online
All enabled resources must be probed.
Default is 1, enabled.
If 0, the service group cannot come online
with disabled resources
To define AutoStartIfPartial attribute:
• Syntax:
hagrp –modify group_name \
AutoStartIfPartial value
• Example:
hagrp –modify group_name \
AutoStartIfPartial 0
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-107
Parallel Attribute
Parallel service groups:
• Run on more than one system at the same time
• Respond to system faults by:
– Staying online on remaining systems
– Failing over to the specified target system

To set the Parallel attribute:


• Syntax:
hagrp –modify group_name Parallel value
• Example:
hagrp –modify myparallelSG Parallel 1
Must set Parallel attribute before adding
resources
Default value: 0 (failover)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-108
Configuring a Service Group

Add Service Group Test Failover Done

Set SystemList Set Critical Res

Y
Set Opt Attributes
N
Success? Check Logs/Fix
Add/Test Resource

Resource Flow Chart


Test Switching

Y
More? Link Resources
N

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-109


Service Group Operations
Service group operations described in the
following sections:
Bringing the service group online:
hagrp –online group_name –sys system_name

Taking the service group offline:


hagrp –offline group_name –sys system_name

Displaying service group properties:


hagrp –display group_name

Example command lines:


hagrp –online oraclegroup –sys train8
hagrp –offline oraclegroup –sys train8
hagrp –display oraclegroup

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-110


Bringing a Service Group
Online
Oracle Oracle

Process Process

IP Mount IP Mount

NIC Disk NIC Disk

Before In-Progress
Oracle

Process

IP Mount

NIC Disk After

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-111


Taking a Service Group Offline

Oracle Oracle

Process Process

IP Mount IP Mount

NIC Disk NIC Disk

Before In-Progress
Oracle

Process

IP Mount

NIC Disk After

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-112


Partially Online Service Groups
A service group is partially online if:
One or more nonpersistent resources is online.
At least one resource is:
• Autostart enabled
Oracle
• Critical
• Offline Process

IP Mount

NIC Disk

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-113


Switching a Service Group
A manual failover can be accomplished by
taking the service group offline on one system,
and bringing it online on another system.
To switch a service group from one system to
another using a single command:
• Syntax:
hagrp –switch group_name –to system_name
• Example:
hagrp –switch mySG –to train8
To switch using Cluster Manager:
Right-click on group—>Switch to—>system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-114


Flushing a Service Group
Misconfigured resources can cause agents’
processes to hang.
Flush service group to stop all online and
offline processes.
To flush a service group using the command
line:
• Syntax:
hagrp –flush group_name –sys system_name
• Example:
hagrp –flush mySG –sys train8
To flush a service group using Cluster Manager:
Right-click on group—>Flush—>system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-115


Deleting a Service Group
Before deleting a service group:
1. Bring all resources offline.
2. Disable resources.
3. Delete resources.
To delete a service group using the command
line:
• Syntax:
hagrp –delete group_name
• Example:
hagrp –delete mySG
To delete a service group using Cluster Manager:
Right-click on group—>Delete.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-116
Summary
You should now be able to:
Describe how application services relate to
service groups.
Translate application requirements to service
group resources.
Define common service group attributes.
Create a service group using the command line
interface.
Perform basic service group operations.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-117


Lab 5: Creating Service Groups
Student Red Student Blue

RedGuiSG BlueGuiSG

RedNFSSG BlueNFSSG

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-118


VERITAS Cluster Server
for Solaris

Lesson 6
Preparing Resources
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-120
Objectives
After completing this lesson, you will be able to:
Describe the components required to create
and share a file system using NFS.
Prepare NFS resources.
Describe the VCS network environment.
Manually migrate the NFS services between
two systems.
Describe the process of automating high
availability.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-121


Operating System Components
Related to NFS
File system-related resources:
• Hard disk partition
• File system to be mounted
• Directory to be shared
• NFS daemons
Network-related resources:
• IP address
• Network interface

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-122


Disk Resources

/dev/(r)dsk/c1t1d0s3 /dev/(r)dsk/c1t1d0s3

Shared Storage

System 1 disk1 System 2


Partition 3

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-123


File System and Share
Resources
/dev/(r)dsk/c1t1d0s3 /dev/(r)dsk/c1t1d0s3
vxfs vxfs
/data /data

nfsd Shared Storage nfsd


mountd mountd

System 1 disk1 System 2


Partition 3

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-124


Creating File System Resources
Format a disk and create a slice:
• Needs to be done on one system
• Use format command.
• Must have the same major and minor
numbers on both systems (for NFS)
Create a file system on the slice:
• From one system only:
mkfs –F fstype /dev/rdsk/device_name
• Can use newfs for UFS file systems
Create a directory for a mount point on
each system:
mkdir /mount_point

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-125


Sharing the File System
1. Mount the file system:
• The file system should not be mounted automatically
at boot time.
• Check the file system, if necessary:
fsck –F fstype /dev/rdsk/device_name
mount –F fstype /dev/dsk/device_name \
mount_point
2. Start the NFS daemons, if they are not already
running:
/usr/lib/nfs/nfsd -a nserver
/usr/lib/nfs/mountd
3. Share the file system:
share mount_point
Note: The file system should not be shared
automatically at boot time.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-126
NFS Resource Dependencies

Share

File System NFS

Disk Partition

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-127


IP Addresses in a VCS
Environment
Administrative IP addresses
• Associated with the physical network interface, such
as qfe1
• Assigned a unique hostname and IP address by the
operating system at boot time
• Available only when the system is up and running
• Used for checking network connectivity
• Called Base or Maintenance IP addresses
Application IP addresses
• Added as a virtual IP address to the network
interface, such as qfe1:1
• Associated with an application service
• Controlled by the high availability software
• Migrated to other systems if the current system fails
• Also called service group or floating IP addresses

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-128


Configuring an Administrative
IP Address
1. Create /etc/hostname.interface with
the desired interface name:
vi /etc/hostname.qfe1
train14_qfe1
2. Edit /etc/hosts and assign an IP address
to the interface name.
vi /etc/hosts

166.98.112.14 train14_qfe1
3. Reboot the system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-129


Configuring Application IP
Addresses
Requires the administrative IP address to be
configured on the interface
Do not create a hostname file.
To set up manually:
1. Configure the IP address using ifconfig:
ifconfig qfe1:1 inet 166.98.112.114 netmask +
2. Bring up the IP address:
ifconfig qfe1:1 plumb
ifconfig qfe1:1 up
3. Assign a virtual hostname (application service name)
to the IP address.
vi /etc/hosts

166.98.112.114 nfs_services
Clients use the application IP address to connect
to the application services.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-130
NFS Services
Resource Dependencies
Application
IP

Share Network
Interface

File System NFS

Disk
Partition

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-131


Monitoring NFS Resources

To verify the file system:


mount|grep mount_point
To verify the disk:
prtvtoc /dev/dsk/device_name
Alternately:
touch /mount_point/sub_dir/.testfile
rm /mount_point/sub_dir/.testfile
To verify the share:
share | grep mount_point
To verify NFS daemons:
ps –ef | grep nfs
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-132
Monitoring the Network
To verify network connectivity, use ping to
connect to other hosts on the same subnet as
the administrative IP address:
ping 166.98.112.253
166.98.112.253 is alive

To verify the application IP address, use


ifconfig to determine whether the IP address
is up:
ifconfig -a

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-133


Migrating NFS Services
1. Make sure that the target system is available.
2. Make sure that the disk is accessible from the target
system.
3. Make sure that the target system is connected to the
network.
4. Bring the NFS services down on the first system
following the dependencies:
a. Configure the application IP address down.
b. Stop sharing the file system.
c. Unmount the file system.
5. Bring the NFS services up on the target system
following the resource dependencies:
a. Check and mount the file system.
b. Start the NFS daemons if they are not already running.
c. Share the file system.
d. Configure and bring the application IP address up.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-134


Automating High Availability
Resources are created once; this is not part of
HA operation.
Script the monitoring process:
• How often should each resource be monitored?
• What is the impact of monitoring on processing
power?
• Are there any resources to be monitored on the target
system even before failing over?
Script the start and stop processes.
Use high availability software to automate:
• Maintain communication between systems to verify
that the target system is available for failover.
• Observe dependencies during starting and stopping.
• Define actions to take when a fault is detected.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-135
Summary
You should now be able to:
Describe the components required to create
and share a file system using NFS.
Prepare NFS resources.
Describe the VCS network environment.
Manually migrate the NFS services between
two systems.
Describe the process of automating high
availability.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-136


Lab 6: Preparing NFS
Resources
Student Red Student Blue

RedGuiSG BlueGuiSG

RedNFSSG BlueNFSSG

c1t8d0s0 c1t15d0s0
/Redfs /Bluefs

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-137


VERITAS Cluster Server
for Solaris

Lesson 7
Resources and Agents
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-139
Objectives
After completing this lesson, you will be able to:
Describe how resources and resource types are defined
in VCS.
Describe how agents work.
Describe cluster configuration files.
Modify the cluster configuration.
Use the Disk resource and agent.
Use the Mount resource and agent.
Create a service group.
Configure resources.
Perform resource operations.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-140


Resources

NFS Service Group


IP

Share
NIC

NFS

Mount

Disk

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-141


Resource Definitions
(main.cf)

Type Unique Name

Mount MyNFSMount (
MountPoint = "/test"
Attributes BlockDevice = "/dev/dsk/c1t2d0s4"
FSType = vxfs
)

Attribute Values

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-142


Nonpersistent and
Persistent Resources
Nonpersistent resources
Operations=OnOff
Persistent resources
• Operations=OnOnly
• Operations=None
Example types.cf entry
type Disk (
static str ArgList[] = { Partition }
NameRule = resource.Partition
static str Operations = None
str Partition
)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-143
Resource Types

NFS_IP
WEB_IP
IP
ORACLE_IP

NFS_NIC_qfe1

NIC
ORACLE_NIC_qfe2

Resource Types Resources

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-144


Resource Type Definitions
(types.cf)
type Keyword Unique Name
Arguments Passed to Agent
type Mount (
static str ArgList[] = { MountPoint,
BlockDevice,
FSType, MountOpt,
FsckOpt,
SnapUmount }
NameRule = resource.MountPoint
str MountPoint
str BlockDevice Name Rule Definition
str FSType
Attribute
Types str MountOpt
str FsckOpt
int SnapUmount = 0
)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-145
Bundled Resource Types
Application Mount
Disk MultiNICA
DiskGroup NFS
DiskReservation NIC
ElifNone Phantom
FileNone Process
FileOnOff Proxy
FileOnOnly ServiceGroupHB
IP Share
IPMultiNIC Volume
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-146
Agents
Periodically monitor resources and send
status information to the VCS engine.
Bring resources online when requested by
the VCS engine.
Take resources offline upon request.
Restart resources when they fault
(depending on the resource configuration).
Send a message to the VCS engine and the
agent log file when errors are detected.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-147


How Agents Work
ifconfig qfe1:1 192.20.47.11 up myNFSIP

types.cf
type IP (
static str ArgList[] = {
Device, Address, Netmask,
Options, ArpDelay,
IfconfigTwice } IP Online
… Entry Point
IPAgent myNFSIP
qfe1
main.cf 192.20.47.11
IP myNFSIP (
Device = qfe1 Online myNFSIP
Address = “192.20.47.11”
) VCS
Engine
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-148
Enterprise Agents
Database Edition / HA 2.2 for Oracle
Informix
VERITAS NetBackup
Oracle
PC NetLink
Sun Internet Mail Server (SIMS)
Sybase
VERITAS NetApp
Apache
Firewall (Checkpoint and Rapture)
Netscape SuiteSpot
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-149
The main.cf File

Cluster-wide configuration
Service groups
Resources
Resource dependencies
Service group dependencies
Resource type dependencies
Resource types—by way of include
statements

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-150


Cluster Definition
(main.cf)
Include for type definition files.
include “types.cf”

Cluster name and Cluster Manager users


cluster mycluster (
UserNames = { admin = "cDRpdxPmHpzS." }
CounterInterval = 5

Systems which are members of the cluster


System train7
System train8
)

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-151


Service Group Definition
(main.cf)
group MyNFSSG (
SystemList = { train8 = 1, train7 = 2 }
AutoStartList = { train8 }
Service Group
Service Group
Mount MyNFSMount ( Attributes
MountPoint = “/data”
BlockDevice = “/dev/dsk/c1t1d0s3”
FSType = vxfs
Resources ) Resource
Attributes
Disk MyNFSDisk (
Partition = c1t1d0s3
)
Resource
MyNFSMount requires MyNFSDisk
Dependencies

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-152


Modifying the Cluster
Configuration
Online configuration:
• Use Cluster Manager or the command line
interface.
Changes are made in memory configuration
on each system while cluster is running.
• Save cluster configuration from memory to
disk:
– File—>Save Configuration
– haconf –dump

Offline configuration:
• Edit main.cf.
• Restart VCS.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-153
Modifying Resource Types

Online configuration:
• Use Cluster Manager.
• Use hatype command.
• Save changes to synchronize in-memory
configuration with configuration files on
disk.
Offline configuration:
• Edit types.cf to change existing resource
type definitions.
• Edit main.cf to add include statements for
new agents with their own types file.
• Restart VCS.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-154


Changing Agent Behavior

Use Cluster
Manager.

Use CLI. hatype -modify Disk MonitorInterval 30


type Disk (
static str ArgList[] = { Partition }
NameRule = group.Name +“_”+ resource.Partition
Edit types.cf. static str Operations = None
str Partition
int MonitorInterval = 30
)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-155
The Disk Resource and Agent
Functions:
Online None (Disk type is persistent.)
Offline None
Monitor Determines whether disk is online by reading
from the raw device
Required attributes:
Partition UNIX partition device name (If no path is
specified, it is assumed to be in /dev/rdsk.)
No optional attributes
Configuration prerequisites: UNIX device file
must exist.
Sample configuration:
Disk MyNFSDisk (
Partition=c1t0d0s0
)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-156
The Mount Resource and Agent
Functions:
Online Mounts a file system
Offline Unmounts a file system
Monitor Checks mount status using stat and
stavfs
Required attributes:
BlockDevice UNIX file system device name
FSType File system type
MountPoint Directory used to mount the file
system
Optional attributes:
FsckOpt, MountOpt, SnapUmount

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-157


Mount Resource Configuration
Configuration prerequisites:
• Create the file system on the disk partition (or volume).
• Create the mount point directory on each system.
• Configure the VCS Disk resource on which Mount
depends.
• Verify that there is no entry in /etc/vfstab.
Sample configuration:
Mount myNFSMount (
MountPoint = “/export1”
BlockDevice = “/dev/dsk/c1t1d0s3”
FSType = vxfs
MountOpt = “-o ro”
)
When setting MountOpt with hares, use % to escape
arguments starting with dash (-):
hares –modify myNFSMount MountOpt %“-o ro”
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-158
Configuring a Service Group

Add Service Group Test Failover Done

Set SystemList Set Critical Res

Y
Set Opt Attributes
N
Success? Check Logs/Fix
Add/Test Resource

Resource Flow Chart


Test Switching

Y
More? Link Resources
N

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-159


Configuring a Resource
Add Resource

Set Non-Critical

Modify Attributes Check Log

Enable Resource Disable Resource Flush Group

Bring Online Clear Resource

Y
N Waiting to Online
Online? Faulted?

Y Done
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-160
Adding a Resource
Suggest using service group name as a prefix for
resource names

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-161


Modifying a Resource
Enter values for each required attribute.
Modify optional attributes, if necessary.
See Bundled Agents Reference Guide for a
complete description of all attributes.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-162


Setting the Critical Attribute
If a critical resource is faulted or taken offline
due to a fault, the entire service group fails
over.
By default, all resources are critical.
Set the Critical attribute to 0 to make a
resource noncritical.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-163


Enabling a Resource
Resources must be enabled in order to be
managed by the agent.
If necessary, the agent initializes the resource
when it is enabled.
All required attributes of a resource must be
set before the resource is enabled.
By default, resources are not enabled.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-164


Bringing a Resource Online
Resources in a failover service group cannot be
brought online if any resource in the service group is:
 Online on another system
 Waiting to go online on another system

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-165


Creating Resource
Dependencies
Parent resources depend on child resources:
• Child resource must be online before parent
resource can come online.
• Parent resource must go offline before child
resource can go offline.
Parent resources cannot be persistent type
resources.
You cannot link resources in different service
groups.
Resources can have an unlimited number of
parent and child resources.
Cyclical dependencies are not allowed.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-166


Linking Resources

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-167


Taking a Resource Offline
Take individual resources offline in order, from
the top of the dependency tree to the bottom.
Use Offline Propagate to take all resources
offline. The selected resource:
• Must be the top online resource in the
dependency tree
• Must have no online parent resources

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-168


Clearing Faults
Faulted resources must be cleared before they
can be brought online.
Persistent resources are cleared when the
problem is fixed and they are probed by the
agent.
• Offline resources are probed periodically.
• Resources can be manually probed.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-169


Disabling a Resource

VCS calls agent on each system in


SystemList.
Agent calls Close entry point, if present, to
reset the resource.
Nonpersistent resources brought offline.
Agent stops monitoring disabled
resources.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-170


Deleting a Resource

Before deleting a resource:


• Take all parent resources offline.
• Take resource offline.
• Disable resource.
• Unlink any dependent resources.
Delete all resources before deleting a
service group.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-171


Summary
You should now be able to:
Describe how resources and resource types
are defined in VCS.
Describe how agents work.
Describe cluster configuration files.
Modify the cluster configuration.
Use the Disk resource and agent.
Use the Mount resource and agent.
Create a service group.
Configure resources.
Perform resource operations.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-172
Lab 7: Configuring Resources
Student Red Student Blue
RedGuiSG RedNFSSG BlueNFSSG BlueGuiSG

RedNFS BlueNFS
Mount Mount

RedNFS BlueNFS
Disk Disk

c1t8d0s0 c1t15d0s0

disk1
/Redfs /Bluefs
disk2

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-173


VERITAS Cluster Server
for Solaris

Lesson 8
Network File System (NFS) Resources
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-175
Objectives
After completing this lesson, you will be able to:
Prepare NFS services for the VCS environment.
Describe the Share resource and agent.
Describe the NFS resource and agent.
Describe the NIC resource and agent.
Describe the IP resource and agent.
Configure and test an NFS service group.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-176


NFS Service Group

IP

Share
NIC

NFS

Mount

Disk

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-177


NFS Setup for VCS
Major and minor numbers for block devices used
for NFS services must be the same on each system.

NFS Stale File Handle


Response Error
NFS NFS
Request Request

Before Failover After Failover


© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-178
Major/Minor Numbers
for Partitions
Each system must have the same major and minor number
for the shared partition. Major/minor numbers must also be
unique within a system.
On System A:
ls -lL /dev/dsk/c1t1d0s3
brw-r----- root sys 32,134 Dec 3 11:50
/dev/dsk/c1t1d0s3
On System B:
ls -lL /dev/dsk/c1t1d0s3
brw-r----- root sys 36,134 Dec 3 11:55
/dev/dsk/c1t1d0s3
To make the major numbers the same on all systems:
haremajor –sd major_number
Example:
haremajor –sd 36
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-179
Major Numbers for Volumes
Verify that the major numbers match on all
systems:
On System A:
grep ^vx /etc/name_to_major
vxdmp 87
vxio 88
vxspec 89

On System B:
grep ^vx /etc/name_to_major
vxdmp 89
vxio 90
vxspec 91

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-180


Changing Major Numbers
for Volumes
To make the major numbers the same on all
systems:
• Before running vxinstall:
– Edit /etc/name_to_major manually and change the
VM major numbers to be the same on both systems.
– Reboot the systems where the change was made.
• After running vxinstall:
haremajor –vx major_num1 major_num2
• Example:
haremajor –vx 91 92
Each system must have the same major
number for the shared volume. Major numbers
must also be unique within a system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-181


The Share Resource and Agent
Functions:
Online Shares an NFS file system
Offline Unshares an NFS file system
Monitor Reads /etc/dfs/sharetab file to check for
an entry for the file system
Required attributes:
PathName Pathname of the file system
Optional attributes: Options
Configuration prerequisites:
• The file system to be shared should not be written
into /etc/dfs/dfstab.
• Must have Mount and NFS resources configured

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-182


The NFS Resource and Agent
Functions:
Online Starts the nfsd and mountd processes if
they are not already running
Offline None (NFS is an OnOnly resource.)
Monitor Checks for the nfsd, mountd, lockd, and
statd processes
Required attributes: None
Optional attributes: Nservers (default=16)
Configuration prerequisites: None
Sample configuration:
NFS mySGNFS (
Nservers = 24
)

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-183


The NIC Resource and Agent

Functions:
Online None (NIC is persistent.)
Offline None
Monitor Uses ping to check connectivity and
determine whether the interface is up
Required attributes:
Device NIC device name
Optional attributes:
NetworkType, PingOptimize, NetworkHosts

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-184


NIC Resource Configuration

Configuration prerequisites:
• Configure Solaris to plumb the interface during
system boot. Edit these files:
– /etc/hosts
– /etc/hostname.interface
• Reboot the system.
Sample configuration:
NIC mySGNIC(
Device = qfe1
NetworkHosts = { “192.20.47.254”,
“192.20.47.253” }
)

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-185


The IP Resource and Agent
Functions:
Online Configures a virtual IP address on an
interface
Offline Removes an IP address from an interface
This is the IP address that users connect to
and that fails over between systems in the
cluster.
Monitor Determines whether a virtual IP address is
present on the interface
Required attributes:
Device Name of NIC
Address Unique application (virtual) IP address
Optional attributes:
NetMask, Options, ArpDelay (default=1s), IfconfigTwice
(default=0)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-186
IP Resource Configuration
Configuration prerequisites:
Configure a NIC resource.
Sample configuration:
IP mySGIP (
Device = qfe1
Address = "192.20.47.61"
)

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-187


Configuring an NFS Service
Group
Add Service Group hagrp -add mySG

Set SystemList hagrp -modify mySG SystemList sys1 0 sys 2

Set Opt Attributes hagrp -modify mySG Attribute Value

Add/Test Resource

Resource Flow Chart

Y
More? Test
N
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-188
Configuring NFS Resources
Add Resource hares -add mySGIP IP mySG

Set Non-Critical hares -modify mySGIP Critical 0

Modify Attributes hares -modify mySGIP Attribute Value

Enable Resource hares -modify mySGIP Enabled 1

Bring Online hares -online mySGIP -sys sys1

N
Online? Troubleshoot Resources

Y Done
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-189
Troubleshooting Resources
hares -modify mySGIP Enabled 0
hagrp -flush mySG -sys sys1

Modify Attributes Check Log

Enable Resource Disable Resource Flush Group

Bring Online Clear Resource


hares -clear mySGIP
Y
N Waiting to Online
Online? Faulted?

Y Done
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-190
Testing the Service Group

Test Failover Done

Set Critical Res


hares -modify mySGIP Critical 1
Y
hares -modify mySGNIC Critical 1
N
hares -modify ………
Success? Check Logs/Fix

hagrp -switch mySG -to sys2 Test Switching

hares -link mySGIP mySGNIC Link Resources

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-191


Summary
You should now be able to:
Prepare NFS services for the VCS environment.
Describe the Share resource and agent.
Describe the NFS resource and agent.
Describe the NIC resource and agent.
Describe the IP resource and agent.
Configure and test an NFS service group.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-192


Lab 8: Creating an NFS Service
Group
Student Red Student Blue
RedNFSSG BlueNFSSG

RedNFS BlueNFS
IP IP

RedNFS RedNFS BlueNFS BlueNFS


Share NIC Share NIC

RedNFS RedNFS BlueNFS BlueNFS


Mount NFS Mount NFS

RedNFS BlueNFS
Disk Disk

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-193


VERITAS Cluster Server for
Solaris
Lesson 9
Event Notification
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-195
Objectives
After completing this lesson, you will be able to:
Describe the VCS notifier component.
Configure the notifier to signal changes in
cluster status.
Describe SNMP configuration.
Describe event triggers.
Configure triggers to provide notification.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-196


Notification
How VCS performs notification:
1. The had daemon sends a message to the notifier daemon
when an event occurs.
2. The notifier daemon formats the event message and
sends an SNMP trap or e-mail message (or both) to
designated recipients.
SNMP
SMTP

notifier

had had

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-197


Message Severity Levels

SMTP Resource has faulted.


Agent has faulted SNMP
Warning
Error SNMP Concurrency violation
SNMP

SevereError
Information

Service group is online. notifier

had had

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-198


Message Queues
1. The had daemon stores a message in a queue when an
event is detected.
2. The message is sent over the private cluster network to all
other had daemons to replicate the message queue.
3. The notifier daemon can be started on another system in
case of failure without loss of messages.
SNMP SNMP
SMTP
SMTP

notifier notifier

had had

O O
O O
Replicated Queue O O
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-199
Configuring Notifier
The notifier daemon can be started and monitored by the
NotifierMngr resource.
Attributes define recipients and severity levels. For example:
SmtpServer = "smtp.acme.com"
SmtpRecipients = { "admin@acme.com" = Warning }

NotifierMngr NotifierMngr

NIC NIC

notifier

O O
O O
O O
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-200
The NotifierMngr Agent
Functions:
Starts, stops, and monitors the notifier daemon
Required attribute:
PathName Full path of the notifier daemon
Required attributes for SMTP e-mail notification:
SmtpServer Host name of the SMTP e-mail server
SmtpRecipients E-mail address and message severity
level for each recipient
Required attribute for SNMP notification:
SnmpConsole Name of the SNMP manager and
message severity level

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-201


The NotifierMngr Resource
Optional attributes:
MessagesQueue Size of message queue size;
default = 30
NotifierListeningPort TCP/IP port number;
default =14144
SnmpdTrapPort TCP/IP port to which SNMP traps
are sent; default=162
SnmpCommunity Community ID for the SNMP
manager; default = "public"
Example resource configuration:
NotifierMngr Notify_Ntfr (
PathName = "/opt/VRTSvcs/bin/notifier"
SnmpConsoles = { snmpserv = Information }
SmtpServer = "smtp.your_company.com"
SmtpRecipients = { "vcsadmin@your_company.com"
= SevereError }
)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-202
SNMP Configuration
Load MIB for VCS traps into SNMP console.
For HP OpenView Network Node Manager,
merge events:
xnmevents -merge vcs_trapd
VCS SNMP configuration files:
• /etc/VRTSvcs/snmp/vcs.mib
• /etc/VRTSvcs/snmp/vcs_trapd

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-203


Event Triggers

How VCS performs notification:


1. VCS determines if notification is enabled.

If disabled, no action is taken.

If enabled, VCS runs hatrigger with
event-specific parameters.
2. The hatrigger script invokes the event-
specific trigger script with parameters
passed by VCS.
3. The event trigger script performs the
notification tasks.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-204


Types of Triggers
Trigger Description Script Name
ResFault Resource faulted resfault
ResNotOff Resource not offline resnotoff
ResStateChange Resource changed state resstatechan
ge
SysOffline System went offline sysoffline
InJeopardy Cluster in jeopardy injeopardy
NoFailover Service group cannot nofailover
failover violation
Violation Resource online on more
than one system
LoadWarning System is overloaded loadwarning
PreOnline Service group about to come preonline
online
PostOnline Service group went online postonline
PostOffline Service group went offline postoffline
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-205
Configuring Triggers
Triggers enabled Triggers configured
by presence of by service group
script file: attributes:
• ResFault • PreOnline
• ResNotOff • ResStateChange
• SysOffline Triggers configured
• InJeopardy by default:
• Violation • Violation
• NoFailover
• PostOffline
• PostOnline
• LoadWarning

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-206


Sample Triggers
Sample trigger scripts include example code to
send an e-mail message.
Mail must be configured on the system
invoking trigger to use sample e-mail code.
# Here is a sample code to notify a bunch of users.
# @recipients=("username@servername.com");
# $msgfile="/tmp/resnotoff$2";
# `echo system = $ARGV[0], resource = $ARGV[1] > $msgfile`;
#
# foreach $recipient (@recipients) {
# # Must have elm setup to run this.
# `elm -s resnotoff $recipient < $msgfile`;
# }
#`rm $msgfile`;
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-207
ResFault Trigger
Provides notification that a resource has
faulted
Arguments to resfault:
• system: Name of the system where the
resource faulted
• resource: Name of the faulted resource

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-208


ResNotOff Trigger
Provides notification that a resource has not
been taken offline
If a resource is not offline on one system, the
service group cannot be brought online on
another.
VCS cannot fail over the service group in the
event of a fault, because the resource will not
come offline.
Arguments to resnotoff:
• system: Name of the system where the
resource is not offline
• resource: Name of the resource that is not
offline

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-209


ResStateChange Trigger
Provides notification that a resource has
changed state
Set at the service group level by the
ResStateChange attribute:
hagrp serv_grp -modify TriggerResStateChange
Arguments to resstatechange:
• system: Name of the system where the
resource faulted
• resource: Name of the faulted resource
• previous_state: State of the resource
before change
• new_state: State of the resource after
change
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-210
SysOffline Trigger
Provides notification that a system has gone
offline
Executed on another system when no
heartbeat is detected
Arguments to sysoffline:
• system: Name of the system that went
offline
• systemstate: Value of the SysState
attribute for the offline system

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-211


NoFailover Trigger
Run when VCS determines that a service group
cannot fail over
Executed on the lowest numbered system in a
running state when the condition is detected
Arguments to nofailover:
• systemlastonline: Name of the last
system where the service group is online or
partially online
• service_group: Name of the service group
that cannot fail over

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-212


Summary
You should now be able to:
Describe the VCS notifier component.
Configure the notifier to signal changes in
cluster status.
Describe SNMP configuration.
Describe event triggers.
Configure triggers to provide notification.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-213


Lab 9: Event Notification
Student Red Student Blue

RedNFSSG BlueNFSSG

ClusterService

webip
notifier
webnic

resfault resfault
nofailover Triggers nofailover
sysoffline sysoffline

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-214


VERITAS Cluster Server for
Solaris
Lesson 10
Faults and Failovers
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-216
Objectives
After completing this lesson, you will be able to:
Describe how VCS responds to faults.
Implement failover policies.
Set limits and prerequisites.
Use system zones to control failover.
Control failover behavior using attributes.
Clear faults.
Probe resources.
Flush service groups.
Test failover.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-217
How VCS Responds to
Resource Faults
1. Calls ResFault trigger, if present.
2. Offlines all resources in the path of the fault starting
from the faulted resource up to the top of the
dependency tree.
3. If an online critical resource is part of the path,
offlines the entire service group in preparation for
failover.
4. Starts the service group on another system in the
service group’s SystemList (if possible).
5. If no other systems are available, service group
remains offline and NoFailover trigger is invoked, if
present.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-218


Practice Exercise
Cas Non- Offlin Take Starts
e Critica e n on
l offlin another
e due system
to
7 fault
5
A - -
6 8
3 B 4 -
4 9
1 C 4 6,7
2
D 4,6 -

E 4,6,7 -
Resource 4 Faults
F 4 7
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-219
Practice Answers
Cas Non- Offlin Take Starts
e Critica e n on
l offlin another
e due system
7 to
5 fault
A - - 6,7 All
6
3 8
B 4 - 6,7 All
4 9
1
C 4 6,7 - -
2
D 4,6 - 6,7 All

E 4,6,7 - 6,7 -
Resource 4 Fails
F 4 7 6 All but
7
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-220
Failover Attributes
AutoFailOver indicates whether automatic failover
is enabled for the service group.
Default value is 1, enabled.
FailOverPolicy specifies how a target system is
selected:
• Priority—System with the lowest priority number in
the list is selected (default).
• RoundRobin—System with the least number of active
service groups is selected.
• Load—System with greatest available capacity is
selected.
Example configuration:
hagrp –modify group AutoFailOver 0
hagrp –modify group FailOverPolicy Load
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-221
FailOverPolicy: Priority
Lowest numbered system in SystemList selected

Svr1
AP1

SystemList = {Svr1 = 0, Svr2 = 1}


Svr3
DB

AP2 Svr2 SystemList = {Svr3=0, Svr1=1, Svr2=2}

SystemList = {Svr2 = 0, Svr1 = 1}


© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-222
FailOverPolicy: RoundRobin
System with fewest running service groups
selected

Svr1 Svr3

Svr4
Svr2

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-223


FailOverPolicy: Load
Capacity = 100 Capacity = 200
AvailableCapacity = 70 AvailableCapacity = 100

AP1
DB1
SmSvr1
Load = 30 Load = 100

LgSvr1

Capacity = 100
AvailableCapacity = 80
DB2

Load = 100
AP2 LgSvr2
SmSvr2
Load = 20
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-224
Setting Load and Capacity

The Load and Capacity attributes are


user-defined values.
Set attributes using the hagrp and hasys
commands.
Examples:
hasys –modify SmSrv1 Capacity 100
hagrp –modify AP1 Load 30
AvailableCapacity calculated by VCS:
Capacity minus Load equals
AvailableCapacity

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-225


Load-Based Failover Example
G4 migrates to Svr1 [SystemList = {Svr1, Svr2, Svr3, Svr4}]
G5 migrates to Svr3 [SystemList = {Svr1, Svr2, Svr3, Svr4}]

Capacity = 100 Capacity = 100


AvailableCapacity = 50 AvailableCapacity = 50

G1 Load=20 G3 Load=30
G6 Load=30 Svr1 G7 Load=20 Svr3

Capacity = 100 Capacity = 100


AvailableCapacity = 20 AvailableCapacity = 40

G2 Load=40 G4 Load=10
G8 Load=40 G5 Load=50
Svr2 Svr4

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-226


The LoadWarning Trigger
Svr3 runs the LoadWarning trigger when AvailableCapacity is 20
or less (80 percent of Capacity) for 10 minutes (600 seconds).

Capacity = 100 Capacity = 100


AvailableCapacity = 40 AvailableCapacity = 0

G1 Load=20 G3 Load=30
G6 Load=30 Svr1 G7 Load=20 Svr3
G4 Load=10 G5 Load=50

Capacity = 100
AvailableCapacity = 20
System
System Svr3
Svr3 ((
Capacity=100
Capacity=100
G2 Load=40 LoadWarningLevel=80
LoadWarningLevel=80
G8 Load=40 LoadTimeThreshold=600
LoadTimeThreshold=600
Svr2 )) Svr4

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-227


Dynamic Load
The DynamicLoad attribute is used in conjunction with
load-estimation software. It is set using the hasys command.

Capacity = 100
AvailableCapacity = 10 SmSvr1 is 90 percent loaded.

GA
GC hasys
hasys -load
-load 90
90
SmSvr1
GD

Capacity = 200 LgSvr2 is 80 percent loaded.


AvailableCapacity = 40

GB hasys
hasys -load
-load 160
160
GH

LgSvr2
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-228
Limits and Prerequisites
DB1 or DB2 can fail over to either SmSvr1 or SmSvr2.
Both AP1 and AP2 can fail over to either LgSvr1 or
LgSvr2.
LgSvr1,
LgSvr1, LgSvr2
LgSvr2
Limits = { Mem=100, Processors=12 }
CurrentLimits = { Mem=50, Processors=8 }
DB1 DB2

DB1,
DB1, DB2
DB2 LgSvr1 LgSvr2
Prerequisites = { Mem=50, Processors=4 }

SmSvr1,
SmSvr1, SmSvr2
SmSvr2
Limits = { Mem=75, Processors=6 }
CurrentLimits = { Mem=50, Processors=4 } AP1 AP2

AP1,
AP1, AP2
AP2
SmSvr1 SmSvr2
Prerequisites = { Mem=25, Processors=2 }
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-229
Combining Capacity and
Limits
When used together, VCS determines the
failover
target as follows:
Limits and Prerequisites are used to determine
a subset of potential failover targets.
Of this subset, the system with the highest
value for AvailableCapacity is selected.
If multiple systems have the same
AvailableCapacity, the first system in
SystemList is selected.
Limits are hard values—if a system does not
meet the Prerequisites, the service group
cannot be started on that system.
Capacity is a soft limit —the system with the
lowest AvailableCapacity is selected, even if
AvailableCapacity
© Copyright 2001 VERITAS Software results in a negative number.
VCS_2.0_Solaris_R1.0_20011130 I-230
Failover Zones
Preferred Failover Preferred Failover
Zone for Database Zone for Web Service Group
Service Group
sysc sysd
sysa sysb

syse sysf

Database
Web

The SystemList for both service groups includes all systems in


the cluster.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-231
SystemZones Attribute
Used to define the preferred failover zones for each
service group.
If the service group is online in a system zone, it
fails to other systems in the same zone based on
the FailOverPolicy until there are no further
systems available in that zone.
When there are no other systems for failover in the
same zone, VCS chooses a system in a new zone
from the SystemList based on the FailOverPolicy.
To define SystemZones:
• Syntax:
hagrp –modify group_name SystemZones \
sys1 zone# sys2 zone# sys zone# …
• Example:
hagrp –modify OracleSG SystemZones sysa \
0 sysb 0 sysc 1 sysd 1 syse 1 sysf 1
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-232
Controlling Failover
Behavior with Resource
Type Attributes
RestartLimit
• Affects how the agent responds to a resource
fault
• Default: 0
ConfInterval
• Determines the amount of time that a tolerance
or restart counter can be incremented
• Default: 600 seconds
ToleranceLimit
• Enables the monitor entry point to return
OFFLINE several times before the resource is
declared FAULTED
• Default: 0
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-233
Restart Example
RestartLimit=1
Resource to be restarted one time within
the ConfInterval timeframe
ConfInterval=180
Resource can be restarted once within a three
minute interval.
MonitorInterval=60 seconds (default value)
Resource is monitored every 60 seconds.
Online Online Offline Online Offline

ConfInterval

MonitorInterval
Restart Faulted
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-234
Adjusting Monitoring
MonitorInterval
• Default value is 60 seconds for most resource
types.
• Consider reducing to 10 or 20 seconds for
testing.
• Use caution when changing this value:
• Load is increased on cluster systems.
• Resources can fault if they cannot respond in the
interval specified.

OfflineMonitorInterval
• Default is 300 seconds for most resource types.
• Consider reducing to 60 seconds for testing.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-235


Modifying Resource Type
Attributes
Can be used to optimize agents
Applied to all resources of the specified
type
Command line example:
hatype –modify FileOnOff MonitorInterval 5

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-236


Preventing Failover
Frozen service group does not fail over when a
critical resource faults.
Service group must be unfrozen to enable fail
over.
To freeze a service group:
hagrp -freeze service_group [-persistent]
To unfreeze a service group:
hagrp -unfreeze service_group [-persistent]
A persistent freeze:
• Requires the cluster configuration to be open
• Remains in effect even if VCS stopped and restarted
throughout the cluster
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-237
Clearing Faults
Verify that the faulted resource is offline.
Fix the problem that caused the fault and clean
up any residual effects.
To clear a fault, type:
hares -clear resource_name [-sys system_name]
To clear all faults in a service group, type:
hagrp -clear group_name [-sys system_name]
Persistent resources are cleared by probing:
hares -probe resource_name [-sys system_name]

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-238


Probing Resources
Causes VCS to immediately monitor the
resource
To probe a resource, type:
hares –probe resource_name –sys system_name

You can clear a persistent resource by probing


it after the underlying problem has been fixed.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-239


Flushing Service Groups
All online/offline agent processes are stopped.
All resources in transitional states waiting to
go online are taken offline.
Propagation of the offline operation is stopped,
but resources waiting to go offline remain in
the transitional state.
You must verify the physical or software
resources are stopped at the operating system
level after flushing to avoid creating a
concurrency violation.
To flush a service group, type:
hagrp –flush group_name –sys system_name

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-240


Testing Failover
Use test resources, such as FileOnOff,
when applicable.
Set lower values for MonitorInterval,
OfflineMonitorInterval, and ConfInterval to
detect faults more quickly.
Manually online, offline, and switch the
service group among all systems.
Simulate failure of each resource in the
service group.
Simulate failover of the entire system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-241


Testing Examples
Force a resource to fault.
Reboot a system.
Halt and reboot a system.
Remove power from a system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-242


Summary
You should now be able to:
Describe how VCS responds to faults.
Implement failover policies.
Set limits and prerequisites.
Use system zones to control failover.
Control failover behavior using attributes.
Clear faults.
Probe resources.
Flush service groups.
Test failover.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-243
Lab 10:
Faults and Failovers
Student Red Student Blue

RedNFSSG BlueNFSSG

resfault resfault
nofailover Triggers nofailover
sysoffline sysoffline

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-244


VERITAS Cluster Server
for Solaris

Lesson 11
Installing and Upgrading
Applications in the Cluster
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-246
Objectives
After completing this lesson, you will be able to:
Describe the benefits of keeping applications
available during planned maintenance.
Freeze service groups and systems.
Upgrade a system in a running cluster.
Describe the differences in application
upgrades.
Apply guidelines for installing new applications
in the cluster.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-247


Maintenance and Downtime

Client Planned
<1%
Downtime
LAN/WAN Equip. 30%
<1%
Software
Environment 40%
5%

People
15%

Hardware
10%

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-248


Operating System Update

Frozen Web Server

Web Requests
Operating System Update
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-249
Application Upgrade

DatabaseSG
WebSG

Frozen

Update Web Application

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-250


Freezing a System

Freezing a system prevents service groups from


failing to it.
Failover can still occur from a frozen system.
Freeze a system while maintenance is being
performed.
Persistent freeze remains in effect through VCS
restarts.
Evacuate moves service groups off the frozen
system.
Syntax:
hasys –freeze [–persistent] [-evacuate] systemA
hasys –unfreeze [–persistent] systemA
Use hasys to determine if a system is frozen:
hasys –display Frozen
hasys –display TFrozen
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-251
Freezing a Service Group

Freezing a service group prevents it from being


taken offline, brought online, or failed over, even if
a concurrency violation occurs.
Example update scenario:
1. Freeze the service group.
2. Update the application on the system(s) that are not
currently running the application.
3. Unfreeze the service group.
4. Move the service group to an updated system and apply
the application update on the original system.
Persistent freeze remains in effect, even if VCS is
stopped and restarted throughout the cluster.
Syntax:
hagrp –freeze service_group [–persistent]
Use hagrp to determine if a group is frozen:
hagrp –display service_group –attribute Frozen
hagrp –display service_group –attribute TFrozen
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-252
Upgrading a System—Reboot
Required
Start Yes

Open the configuration:


More systems
haconf -makerw
To upgrade?
Freeze and evacuate system:
hasys -freeze –persistent No
-evacuate systemA
Move service groups
Stop VCS on system: to appropriate systems:
hastop -sys systemA hagrp -switch mySG
-to systemA
Perform upgrade.
Reboot system. Close the configuration:
haconf -dump -makero
Unfreeze the system:
hasys –unfreeze
–persistent systemA Done
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-253
Differences in
Application Upgrades
Rolling upgrades
No simple reversion from upgrade
Multiple installation directories
Upgrading without rebooting

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-254


Installing Applications: Program
Files on Shared Storage
Advantages:
• Simplifies application setup and
maintenance
• Application service group is
self-contained—all program and data files
are located on file systems within the
service group.
Disadvantages:
• Rolling upgrades cannot be performed.
• Downtime increased during maintenance

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-255


Binaries on Local Storage

Advantages:
• Minimizes downtime during application
maintenance
• May be able to perform rolling upgrades
(depending on the application)
Disadvantages:
• Must maintain multiple copies of the
application
• Not scalable due to maintenance overhead
in clusters with large numbers of service
groups and systems

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-256


Application Installation
Guidelines
Determine where to install program files
(locally or shared disk) based on your
cluster environment.
Install application data files on a shared
storage partition that is accessible to each
system that can run the application.
Specify identical installation options.
Use the same mount point when installing
the application on each system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-257


Summary

You should now be able to:


Describe the benefits of keeping applications
available during planned maintenance.
Freeze service groups and systems.
Upgrade a system in a running cluster.
Describe the differences in application
upgrades.
Apply guidelines for installing new applications
in the cluster.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-258


Lab 11: Installing Applications
in the Cluster
Student Red Student Blue

RedNFSSG

BlueNFSSG

Install Volume Manager


© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-259
VERITAS Cluster Server
for Solaris

Lesson 12
Volume Manager and Process Resources
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-261
Objectives
After completing this lesson, you will be able to:
Describe how Volume Manager enhances high
availability.
Describe Volume Manager storage objects.
Configure shared storage using Volume
Manager.
Create a service group with Volume Manager
resources.
Configure Process resources.
Configure Application resources.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-262


Volume Management
Physical Disks

Virtual Volumes

System1 System2
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-263
Volume Manager Objects
Physical Disks

VxVM
VxVMDisks
Disks

Volumes
Volumes
Subdisk
Subdisks
Subdisks Subdisk
Subdisk

Disk
DiskGroup
Group
Plexes
Plexes

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-264


Disk Groups
Physical
Disks VxVM Disks
Disk1

Disk Group: testDG

 VxVM objects cannot


Disk2
span disk groups.
 Disk groups represent
management and
configuration
Disk3
boundaries.
 Disk groups enable
high availability.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-265


VxVM Volume
Physical
Disks VxVM Disks
Disk1
VxVM Volume

Disk2

Volume1

Disk3

Disk Group: testDG


© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-266
Volume Manager Configuration

Initialize disk(s).
vxdisksetup -i device
Create a disk group.
vxdg init disk_group disk_name=device
Create a volume.
vxassist -g disk_group make vol_name size
Make a file system.
mkfs -F vxfs volume_device

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-267


Testing Volume Manager
Configuration
On the first system:
1. Create a mount point directory.
2. Mount the VMVol file system on the first
system.
3. Verify that the file system is accessible.
4. Unmount the file system.
5. Deport the disk group.
On the next system(s):
1. Create a mount point directory with the
same name.
2. Import the disk group.
3. Start the volume.
4. Mount and verify the file system.
5. Unmount the file system.
6. Deport the disk group.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-268
Volume Manager Resources

Proc

Mount
VMSG

VMVol

VMDG

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-269


DiskGroup Resource and Agent
Functions:
Online Imports a Volume Manager disk group
Offline Deports a disk group
Monitor Determines the state of the disk
group using vxdg
Required attributes:
DiskGroup Name of the disk group
Optional attributes:
StartVolumes, StopVolumes

Configuration Prerequisites:
Disk group and volume must be configured.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-270


Volume Resource and Agent
Functions:
Online Starts a volume
Offline Stops a volume
Monitor Reads a byte of data from the raw device
interface for the volume
Required attributes:
DiskGroup Name of the disk group
Volume Name of the volume
Optional attributes: None
Configuration Prerequisites:
Disk group and volume must be configured.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-271


Configuring a Service Group

Add Service Group Test Failover Done

Set SystemList Set Critical Res

Y
Set Opt Attributes
N
Success? Check Logs/Fix
Add/Test Resource

Resource Flow Chart


Test Switching

Y
More? Link Resources
N

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-272


Configuring a Resource
Add Resource

Set Non-Critical

Modify Attributes Check Log

Enable Resource Disable Resource Flush Group

Bring Online Clear Resource

Y
N Waiting to Online
Online? Faulted?

Y Done
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-273
Process Resource and Agent
Functions:
Online Starts a daemon process
Offline Stops a process
Monitor Determines whether the process is running
using procfs
Required attributes:
PathName Full path of the executable file
Optional attributes:
• Arguments
• Use % to escape dashed arguments:
hares –modify myProc Arguments “%-db –q1h”
Sample Configuration:
Process sendmail (
PathName = “/usr/lib/sendmail”
Arguments = “-db -q1h”
)
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-274
The Application Resource and
Agent
Functions:
Online Brings an application online using StartProgram
Offline Takes an application offline using StopProgram
Monitor Monitors the status of the application in a number
of ways
Clean Takes the application offline using CleanProgram
or kills all the processes specified for the
application
Required Attributes:
StartProgram Name of executable to start application
StopProgram Name of executable to stop application
One or more of the following:
MonitorProgram Name of executable to monitor application
MonitorProcesses List of processes to be monitored
PidFiles List of pid files that contain the process ID
of the processes to be monitored
Optional Attributes:
CleanProgram, User
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-275
Application Resource
Configuration
Configuration prerequisites:
• The application should have its own start and stop
programs.
• It should be possible to monitor the application by
either running a program that returns 0 for failure and
1 for success or by checking a list of processes.
Sample configuration:
Application samba_app (
StartProgram = “/usr/sbin/samba start”
StopProgram = “/usr/sbin/samba stop”
PidFiles = { “/var/lock/samba/smbd.pid” }
MonitorProcesses = { “smbd” }
)

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-276


Summary

You should now be able to:


Describe how Volume Manager enhances high
availability.
Describe Volume Manager Storage Objects.
Configure shared storage using Volume
Manager.
Create a service group with Volume Manager
resources.
Configure Process resources.
Configure Application resources.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-277


Lab 12: Volume Manager
and Process Resources
Student Red Student Blue

ProdSG Prod Test TestSG


Loopy Loopy

Prod Test
Mount Mount

ProdVol TestVol

ProdDG TestDG
RedNFSSG BlueNFSSG

ProdDG TestDG
ProdVol TestVol

/prod /test

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-278


VERITAS Cluster Server
for Solaris

Lesson 13
Cluster Communication
Overview

Troubleshooting

Cluster Using Volume


Communication Manager

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-280
Objectives
After completing this lesson, you will be able to:
Describe how systems communicate in a
cluster.
Describe the LLT and GAB configuration files
and commands.
Reconfigure LLT and GAB.
Describe the effects of cluster communication
failures.
Recover from communication failures.
Configure the InJeopardy trigger.
Troubleshoot LLT and GAB.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-281


Cluster Communication
agent agent
agent agent agent agent

Agent Framework Agent Framework

had had

GAB GAB

LLT LLT

System A System B

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-282


GAB Membership Status
Determines cluster membership using
heartbeat signals
Heartbeats transmitted by LLT
Membership determined by cluster ID number

GAB GAB GAB GAB

LLT LLT LLT LLT

System A System B System C System D

Cluster 1
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-283
Cluster State
GAB tracks all changes in configuration and
resource status.
Sends atomic broadcast to immediately
transmit new configuration and status

Add Resource

1 1 1
2 2 2
1 3 3 3 4 3
2 4 6 4 5 4
5 5 5
6 6 6

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-284


Low Latency Transport (LLT)

Provides traffic distribution across all


private links
Sends and receives heartbeats
Transmits cluster configuration data
Determines whether connections are
reliable (more than one exists) or unreliable
Runs in kernel for best performance
Connection-oriented
Uses DLPI over Ethernet
Nonroutable
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-285
Configuring LLT

Required configuration files:


• /etc/llttab
• /etc/llthosts
Optional configuration file:
/etc/VRTSvcs/conf/sysname

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-286


The llttab File

set-node train1
set-cluster 10
# Solaris example
link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - -
start

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-287


Setting Node Number and Name
# /etc/llttab
0 - 255 set-cluster 10
set-node /etc/VRTSvcs/conf/sysname
link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - -
link-lowpri qfe1 /dev/qfe:1 - ether - -
start

# /etc/llthosts
0 - 31 3 sysa
7 sysb

# /etc/VRTSvcs/conf/sysname
sysb
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-288
The link Directive

Tag Name Range (all) SAP

# /etc/llttab
set-node 1
set-cluster 10
# Solaris example
link qfe0 /dev/qfe:0 - ether - -
link hme0 /dev/hme:0 - ether - -
link-lowpri qfe1 /dev/qfe:1 - ether - -
start

Device:Unit Link Type MTU

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-289


Low Priority Link
Public network link as redundant private
network link
LLT sends only heartbeats on low priority link if
other private network links are functional.
Rate of heartbeats slower to reduce traffic
Low priority link is used for all cluster
communication if all private links fail.
Public network can be saturated with cluster
traffic.
Risk of system panics if the same system
ID/cluster ID is present on network
Configured with link-lowpri directive
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-290
Other LLT Directives
# for verbose messages from
# lltconfig, add this line first
# in llttab
set-verbose 1
# the following will cause only
# nodes 0-7 to be valid for
# cluster participation
exclude 8-31
# peerinact specifies how long the link is
# down before marked inactive
set-timer peerinact: 1600
# regulates heartbeat interval
set-timer heartbeat:50
set-timer heartbeatlo:100
start
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-291
The llthosts File
Format:
node_number name
Example entries:
1 systema
2 systemb
3 systemc
No spaces before number
Have same entries on all systems
Unique node numbers required
System names match llttab, main.cf
System names match sysname, if used

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-292


The sysname File

Enables llttab and llthosts to be


identical on all systems
Must be different on each system
Contains unique system name
Removes dependency on UNIX node name
System name must be in llthosts
System name must match main.cf

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-293


GAB Configuration

GAB configuration file:


/etc/gabtab
GAB configuration command entry:
/sbin/gabconfig -c -n seed_number
Seed number is set to number of systems
in the cluster.
Starts GAB under normal conditions
Other options discussed later

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-294


Changing Communication
Configuration

Stop VCS Start VCS

Stop GAB Start GAB

Stop LLT Edit Files Start LLT

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-295


Stopping GAB and LLT

Stop VCS engine first.


Stop GAB on each system:
/sbin/gabconfig -U

Stop LLT:
/sbin/lltconfig -U

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-296


Starting LLT

Edit configuration files on each system


before starting LLT on any system.
Start LLT on each system in the cluster:
/sbin/lltconfig -c
LLT starts if configuration files are correct.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-297


Starting GAB

Start LLT before starting GAB.


Start GAB on each system, specifying a
value for -n equal to the number of
systems in the cluster:
/sbin/gabconfig -c -n #

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-298


Starting LLT and GAB
Automatically
Startup files added when VCS is installed:
/etc/rc2.d/S70llt
/etc/rc2.d/S92gab

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-299


The LinkHbStatus Attribute

Internal VCS system attribute that provides


link status information
Use hasys command to view status:
hasys -display system -attribute LinkHbStatus
hme:0 UP qfe:0 UP

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-300


The lltstat Command
train12# lltstat -nvv |pg
LLT node information:
Node State Link Status Address
* 0 train12 OPEN
link1 UP 08:00:20:AD:BC:78
link2 UP 08:00:20:AD:BC:79
link3 UP 08:00:20:B7:08:5C
1 train11 OPEN
link1 UP 08:00:20:B4:0C:3B
link2 UP 08:00:20:B4:0C:3B
link3 UP 08:00:20:B4:0C:3B
Shows which system runs the command

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-301


Other lltstat Options
train12#lltstat
train12#lltstat -c -c
LLT
LLT configuration
configuration information:
information:
node:
node: 20
20
name:
name: train3
train3
cluster:
cluster: 1010
version:
version: 1.1
1.1
nodes:
nodes: 20 -- 21
20 21
max
max nodes:
nodes: 32
32
max
max ports:
ports: 33
(…)
(…)

train12#
train12# lltstat
lltstat -l
-l
LLT link information:
LLT link information:
Link
Link Tag
Tag State
State Type
Type Pri
Pri SAP
SAP MTU
MTU Addrlen
Addrlen Xmit
Xmit Recv
Recv …..
…..
00 hme0
hme0 onon ether
ether hipri
hipri 0xCAFE
0xCAFE 1500
1500 66 3732
3732 3678
3678 00
11 qfe0
qfe0 on
on ether
ether hipri
hipri 0xCAFE
0xCAFE 1500
1500 66 3731
3731 3674
3674 00
22 qfe1
qfe1 on
on ether
ether lowpri
lowpri 0xCAFE
0xCAFE 1500
1500 66 1584
1584 6719
6719 00

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-302


The lltconfig Command
train12# lltconfig -a list
Link 0 (qfe0):
Node 0 : 08:00:20:AD:BC:78 permanent
Node 1 : 08:00:20:AC:BE:76 permanent
Node 2 : 08:00:20:AD:BB:89 permanent

Link 1 (hme0):
Node 0 : 08:00:20:AD:BC:79 permanent
Node 1 : 08:00:20:AC:BE:77 permanent
Node 2 : 08:00:20:AD:BB:80 permanent

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-303


GAB Membership Notation
# /sbin/gabconfig -a 20 Placeholder
GAB Port Memberships
===============================================
Port a gen a36e003 membership 01 ; ;12
Port h gen fd57002 membership 01 ; ;12

had is communicating. Nodes 0 and 1

Indicates 10s Digit


(0 displayed if node 10
GAB is communicating. is a member of the cluster)

Nodes 21 and 22

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-304


Communication Failures
Network partition:
Failure of all Ethernet heartbeat links between
one or more systems:
• Occurs when one or more systems fail
• Also occurs when all Ethernet heartbeat
links fail
Split brain:
• Failure of Ethernet heartbeat links is
misinterpreted as failure of one or more
systems.
• Multiple systems start running the same
failover application.
• Leads to data corruption if applications
using shared storage
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-305
Split-Brain Condition

Changing Changing
Block Block
20460 20460

Block
20460
INVALID

Shared Storage
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-306
Preventing Split-Brain Condition

Redundant heartbeat channels:


• Multiple private network heartbeats
• Public network heartbeat
• Disk heartbeats
• Service group heartbeat
SCSI disk reservation
Jeopardy
Autodisabling
Seeding
PreOnline trigger
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-307
Jeopardy Condition

A special type of cluster membership called


jeopardy is formed when one or more systems
have only a single Ethernet heartbeat link.
Service groups continue to run, and the cluster
functions normally.
Failover and switching at operator request are
unaffected.
The service groups running on a system in
jeopardy are not taken over by another system
if a system failure is detected by VCS.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-308


Jeopardy Example

SG_1 SG_2 SG_3

A B C

Regular Membership: A, B
Jeopardy Membership: C
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-309
Network Partition Example
Autodisabled for C Autodisabled for A,B

SG_1 SG_2 SG_3

A B C

Regular Membership: A, B New Regular Membership


No Jeopardy Membership No Jeopardy Membership
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-310
Split Brain Example
Service Groups Not Autodisabled

SG_1 SG_2 SG_3 SG_1 SG_2 SG_3

A B C

Regular Membership: A, B New Regular Membership


No Jeopardy Membership No Jeopardy Membership
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-311
Recovery Behavior

When a private network is reconnected after a


network partition, VCS and GAB are stopped
and restarted as such:
Two-system cluster:
• System with the lowest LLT node number
continues to run VCS.
• VCS is stopped on higher-numbered system.
Multi-system cluster:
• Mini-cluster with the most systems running
continues to run VCS. VCS is stopped on the
systems in the smaller mini-cluster(s).
• If split into two equal size mini-clusters, the
cluster containing the lowest node number
continues to run VCS.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-312
Configuring Recovery Behavior
Modify /etc/gabtab. For example:
/sbin/gabconfig -c -n 2 –j
Causes high numbered node to panic if
GAB tries to start after all Ethernet
connections simultaneously stop and then
restart
Split brain avoidance mechanism

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-313


Preexisting Network Partitions

This condition is caused by failure in


private network communication channels
while systems are down.
A preexisting network partition can lead to
split brain when systems are started.
VCS uses seeding to prevent split brain
condition in the case of a network partition.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-314


Seeding

Prevents split brain


Only seeded systems can run VCS.
Systems are seeded only if GAB can
communicate with other systems.
Seeding determines the number of systems
that must be communicating to allow VCS
to start.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-315


Manually Seeding the Cluster
To start GAB and seed the system on which
the command runs:
gabconfig -c –x
Warning: Do not use these options in gabtab.
Overrides –n; allows GAB to immediately
seed the cluster so VCS can build a running
configuration
Use when the number of systems available
is less than the number specified by –n in
/etc/gabtab.
Only use on one system in the cluster;
others then seed from first system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-316


The InJeopardy Trigger
To configure, add an injeopardy script to
/opt/VRTSvcs/bin/triggers.
The trigger is called when a system transitions
from regular cluster membership to jeopardy.
Arguments are the name of the system in jeopardy
and the system state.
The trigger is invoked on all systems that are part
of jeopardy membership.
The InJeopardy trigger is not run when:
• A system loses its last network link.
• A system loses both private network links at once.
• A system transitions from any other state (such as
down state) to jeopardy state.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-317


The lltdump Command
train12# lltdump -f /dev/qfe:0 -V -A -R
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq
000000b9 len 0132 ack 0000007c 01 01 64 05 00
00 00 01 00 07 89 00
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq
000000bb len 0166 01 01 64 05 00 00 00 01 00
07 88 00
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq
000000bc len 0166 ack 00000080 01 01 64 05 00
00 00 01 00 07 89 00
DAT C 100 S 01 D 00 P 007 rdy 80000081 seq
000000bf len 0176 ack 00000083 01 01 64 05 00
00 00 01 00 07 89 00

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-318


The lltshow Command
train12# lltshow -n 0 |pg
=== LLT node 0:
nid= 0 state= 4 OPEN
my_gen= 3a89ec14 peer_gen= 0 flags= 0 links= 3
opens= ffffffff readyports= 0 rexmitcnt= 0 nxtlink= 0
lastacked= 0 nextseq= 0 recv_seq= 0
xmit_head= 0 xmit_tail= 0 xmit_next= 0
xmit_count= 0 recv_reseq= 0 oos= 0
retrans= 0 retrans2= 0
link [0]: hb= 0 hb2= 0 peerinact= 0 lasthb= 0
valid= 1 perm= 1 flags= 0 stat= 1
arpmode= 0
addr= 08 00 20 AD BC 78 00 00 00 00
dlpi_hdr= 00 00 00 07 00 00 00 08 00 00 00 14 00 00 00
64 00 00 00 00 08 00 20 AD BC 78 CA FE 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Identifies LLT Packets on Public Network

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-319


Common LLT Problems

Node or cluster number out of range:


Node number must be between 0 and 31.
Cluster number must be between 0 and 255.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-320


Incorrect LLT Specification

Incorrectly specified Ethernet link device:


qf3 should be qfe
LLT not started:
Check /etc/llttab for the start
directive.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-321


Common GAB Problems

No GAB membership
• gabconfig -a
• gabconfig -c -nN
GAB starts then shuts down
• Check cabling

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-322


Problems with main.cf

VCS does not start:


Check main.cf for incorrect entries.
hacf -verify aborts:
Check system names in main.cf to verify that
they match llthosts and llttab.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-323


Summary

You should now be able to:


Describe how systems communicate in a
cluster.
Configure the Low Latency Transport (LLT).
Configure the Group Membership and
Atomic Broadcast (GAB) mechanism.
Start and stop LLT and GAB.
Configure the InJeopardy trigger.
Troubleshoot LLT and GAB.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-324


Lab 13: Cluster Communication
Student Red Student Blue

Test TestSG
Prod Loopy
Loopy
Test
Prod Mount
Mount

TestVol
ProdVol

TestDG
ProdDG

RedNFSSG BlueNFSSG

injeopardy

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-325


VERITAS Cluster Server
for Solaris

Lesson 14
Troubleshooting
Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-327
Objectives
After completing this lesson, you will be able to:
Monitor system and cluster status.
Apply troubleshooting techniques in a VCS
environment.
Detect and solve VCS communication
problems.
Identify and solve VCS engine problems.
Correct service group problems.
Solve problems with agents.
Resolve problems with resources.
Plan for disaster recovery.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-328


Monitoring VCS

VCS log files


System log files
The hastatus utility
SNMP traps
Event notification triggers
Cluster Manager

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-329


VCS Log Entries
Engine log: /var/VRTSvcs/log/engine_A.log

TAG_D 2001/04/03 12:17:44 VCS:11022:VCS engine (had)


started
TAG_D 2001/04/03 12:17:44 VCS:10114:opening GAB
library
TAG_C 2001/04/03 12:17:45 VCS:10526:IpmHandle::recv
peer exited errno 10054
TAG_E 2001/04/03 12:17:52 VCS:10077:received new
cluster membership
TAG_E 2001/04/03 12:17:52 VCS:10080:Membership: 0x3,
Jeopardy: 0x0
TAG_D 2001/04/03 12:17:52 VCS:10322:Node '1' changed
state from 'UNKNOWN' to 'INITING'
TAG_B 2001/04/03 12:17:52 VCS:10455:Operation
'haclus -modify(0xc13)' rejected.
Most Recent Sysstate=CURRENT_DISCOVER_WAIT,Channel=BCAST,Flag
s=0x40000
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-330
Agent Log Entries
Agent logs kept in /var/VRTSvcs/log
Log files named AgentName_A.log
LogLevel attribute settings:
• none
• error (default setting)
• info
• debug
• all
To change log level:
hatype -modify res_type LogLevel debug

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-331


Troubleshooting Guide

Primary types of problems:


• Cluster communication
• VCS engine startup
• Service groups and resources
Determine path based on hastatus output:
• Cluster communication problem indicated by
message:
Cannot connect to server -- Retry Later

• VCS engine startup problem indicated by


systems with WAIT status
• Service group and resource problems indicated
when VCS engine in RUNNING state
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-332
Cluster Communication
Problems
Run gabconfig –a.
No port a membership indicates a communication
problem.
No port h membership indicates a VCS engine (had)
startup problem.
# gabconfig -a
GAB Port Memberships
Communication Problem:
===================================
GAB Not Seeded

# gabconfig -a
GAB Port Memberships
===================================
Port a gen 24110002 membership 01
VCS Engine Not Running: Port h gen 65510002 membership
GAB and LLT Functioning
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-333
Problems with GAB and LLT
If GAB is not seeded (no port memberships):
• Run lltconfig to determine if LLT is running.
• Run lltstat -n to determine if systems can see each
other on the LLT link.
• Check the physical network connection(s) if LLT cannot see
each node.
• Check gabtab for correct seed value (-n) if LLT links are
functional.
Manually seed the cluster, if necessary.
lltconfig
LLT is running

lltstat -n
LLT node information:
Node State Links
* 0 train11 OPEN 2
1 train12 OPEN 2
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-334
VCS Engine Startup Problems
Start the VCS engine using hastart.
Check hastatus to determine system
state.
If not running:
• If ADMIN_WAIT or STALE_ADMIN_WAIT,
see next sections.
• Check logs.
• Verify that the llthosts file exists and
system entries match cluster configuration
(main.cf).
• Check gabconfig.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-335


STALE_ADMIN_WAIT

To recover from STALE_ADMIN_WAIT state:


1. Visually inspect the main.cf file to
determine whether it is valid.
2. Edit the main.cf file, if necessary.
3. Verify the syntax of main.cf, if modified.
hacf –verify config_dir
4. Start VCS on the system with the valid
main.cf file:
hasys -force system_name
5. All other systems perform a remote build
from the system now running.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-336
ADMIN_WAIT

A system can be in the ADMIN_WAIT state


under these circumstances:
• A .stale flag exists and the main.cf file has a
syntax problem.
• A disk error occurs affecting main.cf during a
local build.
• The system is performing a remote build and
last running system fails.
Restore main.cf and use procedure for
STALE_ADMIN_WAIT.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-337


Service Group Not Configured
to AutoStart or Run
Service group not onlined automatically
when VCS starts:
Check AutoStart and AutoStartList attributes:
hagrp –display service_group
Service group not configured to run on the
system:
• Check the SystemList attribute.
• Verify that the system name is included.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-338


Service Group AutoDisabled
Autodisable occurs when:
• GAB sees a system but had is not running on
the system.
• Resources of the service group are not fully
probed on all systems in the SystemList.
• A particular system is visible through disk
heartbeat only.
Make sure that the service group is offline on all
systems in SystemList attribute.
Clear the AutoDisabled attribute:
hagrp –autoenable service_group -sys system
Bring the service group online.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-339
Service Group Waiting for
Dependencies
Check service group dependencies:
hagrp -dep service_group
Check resource dependencies:
hares -dep resource

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-340


Service Group
Not Fully Probed
Usually a result of misconfigured resource
attributes
Check ProbesPending attribute:
hagrp -display service_group
Check which resources are not probed:
hastatus -sum
Check Probes attribute for resources:
hares -display
To probe resources:
hares –probe resource -sys system

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-341


Service Group Frozen
Verify value of Frozen and TFrozen
attributes:
hagrp -display service_group
Unfreeze the service group:
hagrp -unfreeze group [-persistent]
If you freeze persistently, you must
unfreeze persistently.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-342


Service Group Is Not Offline
Elsewhere
Determine which resources are online/offline:
hastatus -sum
Verify the State attribute:
hagrp -display service_group
Offline the group on the other system:
hagrp -offline
Flush the service group:
hagrp -flush service_group -sys system

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-343


Service Group Waiting for
Resource
Review Istate attribute of all resources to determine
which resource is waiting to go online.
Use hastatus to identify the resource.
Make sure the resource is offline (at the operating
system level).
Clear the internal state of the service group:
hagrp –flush service_group -sys system
Bring all other resources in the service group
offline and try to bring these resources online on
another system.
Verify that the resource works properly outside
VCS.
Check for errors in attribute values.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-344


Incorrect Local Name
1. Create /etc/VRTSvcs/conf/sysname with
the correct system name shown in main.cf.
2. Stop the local system.
3. Start VCS.
4. List all system names.
5. Open the configuration.
6. Delete any systems with incorrect names.
7. Save the configuration.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-345


Concurrency Violations
Occurs when a failover service group is online
or partially online on more than one system
Notification provided by the Violation trigger:
• Invoked on the system that caused the concurrency
violation
• Notifies the administrator and takes the service
group offline on the system causing the violation
• Configured by default with the violation script in
/opt/VRTSvcs/bin/triggers
• Can be customized:
– Send message to the system log.

– Display warning on all cluster systems.

– Send e-mail messages.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-346


Service Group Waiting for
Resource to Go Offline
Identify which resource is not offline:
hastatus –summary
Check logs.
Manually bring the resource offline, if
necessary.
Configure ResNotOff trigger for notification or
action.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-347


Agent Not Running
Determine whether the agent for that
resource is FAULTED:
hastatus –summary
Use the ps command to verify that the agent
process is not running.
Verify values for ArgList and ArgListValues
type attributes:
hatype –display res_type
Restart the agent:
haagent –start res_type -sys system

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-348


Problems Bringing Resources
Online
Possible causes of failure while bringing
resources online:
Waiting for child resources
Stuck in a WAIT state
Agent not running

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-349


Problems Bringing Resource
Offline
Waiting for parent resources to come offline
Waiting for a resource to respond
Agent not running

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-350


Critical Resource Faults
Determine which critical resource has
faulted:
hastatus –summary
Make sure that the resource is offline.
Examine the engine log.
Fix the problem.
Verify that the resources work properly
outside of VCS.
Clear fault in VCS.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-351


Clearing Faults
After external problems are fixed:
1. Clear any faults on nonpersistent resources.
hares -clear resource -sys system
2. Check attribute fields for incorrect or missing
data.
If service group is partially online:
1. Flush wait states:
hagrp -flush service_group -sys system
2. Bring resources offline first before bringing them
online.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-352


Planning for Disaster Recovery
Back up key VCS files:
• types.cf and customized types files
• main.cf
• main.cmd
• sysname
• LLT and GAB configuration files
• Customized trigger scripts
• Customized agents
Use hagetcf to create an archive.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-353


The hagetcf Utility
# hagetcf
Saving 0.13 MB
Enter path where configuration can be saved (default
is /tmp):
Collecting package info
Checking VCS package integrity
Collecting VCS information
Collecting system configuration
…..
Compressing /tmp/vcsconf.train12.tar to
/tmp/vcsconf.train12.tar.gz
Done. Please e-mail /tmp/vcsconf.train12.tar.gz to
your support provider.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-354


Summary

You should now be able to:


Monitor system and cluster status.
Apply troubleshooting techniques in a VCS
environment.
Identify and solve VCS engine problems.
Correct service group problems.
Solve problems with agents.
Resolve problems with resources.
Plan for disaster recovery.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-355


Lab Exercise

Lesson 14
Troubleshooting
VERITAS Cluster Server for
Solaris

Appendix D
Special Situations
Overview
This lesson provides a guide for managing
certain situations in a cluster environment:
VCS upgrades
VCS patches
System changes: Adding, removing, and
replacing cluster systems

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-358


Objectives
After completing this lesson, you will be able to:
Upgrade VCS software to version 2.0 from any
earlier versions.
Install a VCS patch.
Add systems to a running VCS cluster.
Remove systems from a running VCS cluster.
Replace systems in a running VCS cluster.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-359


Preparations for VCS Upgrade

Acquire the new VCS software.


Contact VERITAS Technical Support.
Read the release notes.
Write scripts to automate as much of the
process as possible.
If available, deploy on a test cluster first.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-360


VCS Upgrade Process

Start I. Complete initial preparation.

II. Stop the existing VCS software.

III. Remove the existing VCS software and


add the new VCS version.

IV. Verify the configuration and make changes


as needed.

V. Start VCS on one system


and propagate the configuration to others.

Done
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-361
Step I - Initial Preparation
1. Open the cluster configuration and freeze all
service groups persistently:
haconf –makerw
hagrp –list
hagrp –freeze group_name -persistent
2. Save and close the VCS configuration:
haconf –dump -makero
3. Make a backup of the full configuration,
including:
• All configuration files
• Any custom-developed agents
• Any modified VCS scripts
4. Rename the existing types.cf file:
mv /etc/VRTSvcs/conf/config/types.cf \
/etc/VRTSvcs/conf/config/types.save
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-362
Step II - Stopping VCS Software
1. Stop the VCS engine on all systems leaving the
application services running:
hastop –all -force
2. Remove heartbeat disk configurations:
gabdiskhb –l
gabdiskx –l
gabdiskhb –d disk_name
gabdiskx –d device_name
3. Stop GAB and unload GAB:
gabconfig –U
modinfo | grep gab
modunload -i modid
4. Stop and unload LLT.
lltconfig –U
modinfo | grep llt
modunload -i modid

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-363


Step III - Removing Old and
Adding New VCS Software
1. Remove the existing VCS (pre-2.0) software
packages.
pkgrm VRTScscm VRTSvcs VRTSgab VRTSllt \
VRTSperl

2. Add the new VCS software packages.


pkgadd –d /package_directory

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-364


Step IV - Verifying and
Changing the Configuration
1. Determine differences between existing and
new types.cf files:
diff /etc/VRTSvcs/conf/config/types.save \
/etc/VRTSvcs/conf/config/types.cf
2. Merge the new and old versions of types.cf
files:
a. Check changes in attribute names.
b. Check modified resource type attributes.
3. Compare and merge any necessary changes
to VCS scripts.
4. Verify the configuration files:
hacf –verify /etc/VRTSvcs/conf/config

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-365


Step V - Starting The VCS
Cluster
1. On all systems in the cluster, start LLT and GAB:
lltconfig -c
gabconfig –c –n #
2. Start the VCS engine on the system where the changes
were made:
hastart
3. Start the VCS engine on all other systems in the cluster
in a stale state:
hastart -stale
4. Open the configuration, unfreeze the service groups,
and save and close the configuration:
haconf –makerw
hagrp –unfreeze group_name –persistent
haconf –dump -makero

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-366


Installing a VCS Patch
Start

Same I. Carry out the initial preparation


Same
as
as in
in
VCS
VCS
Upgrade
Upgrade II. Stop the old VCS software

III. Install and verify the new patch

IV. Start the VCS software

Done

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-367


Step III - Installing and
Verifying the New Patch
1. Verify that VRTS* packages are all version
2.0.
pkginfo –l VRTSgab VRTSllt VRTSvcs \
VRTSperl | grep VERSION
2. Add the new VCS patch on each system
using the provided utility.
./vcs_install_patch
3. Verify that the new patch has been installed.
showrev –p | grep VRTS

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-368


Step IV - Starting the VCS
Cluster
1. Start LLT, GAB, and VCS on all systems in the
cluster.
lltconfig –c
gabconfig –c –n #
hastart
2. Open the configuration, unfreeze the service
groups, and save and close the configuration:
haconf –makerw
hagrp –unfreeze group_name –persistent
haconf –dump -makero

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-369


Adding Systems to a Running
VCS Cluster
1. Configure LLT with the same cluster number
and a unique node id on the new system.
2. Configure GAB.
3. Connect the new system to the private
network.
4. Edit /etc/llthosts files on all systems in
the cluster to add the system name and node
ID of the new system.
5. Start LLT, GAB, and VCS on the new system.
6. Change the SystemList attribute for each
service group that can run on the new system.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-370


Removing Systems
from a Running VCS Cluster
1. Switch all running service groups to other systems and
freeze the system.
2. Stop VCS on the system using hastop -local.
3. Stop GAB on the system:
gabconfig –U
modinfo | grep gab
modunload -i modid
4. Stop and unload LLT on the system:
lltconfig –U
modinfo | grep llt
modunload –i modid
5. Remove the system from the cluster configuration:
hasys –delete system_name
6. Edit /etc/llthosts on all systems to delete the entry
for the system to be removed.
7. Remove llttab and gabtab files on that system.
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-371
Replacing Systems
in a Running VCS Cluster
1. Evacuate any service groups running on the system to
be replaced.
2. Make the VCS configuration read/write, freeze the
system persistently, save and close the configuration.
haconf –makerw
hasys –freeze system_name –persistent
haconf –dump -makero
3. Physically replace the system with a new one using the
same VCS configuration (same cluster number, node id,
and system name).
4. Connect the new system to the private network.
5. Start LLT, GAB, and VCS on the new system.
6. Make the VCS configuration read/write, unfreeze the
system, save and close the configuration.
haconf –makerw
hasys –unfreeze system_name –persistent
haconf –dump -makero
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-372
Summary
You should now be able to:
Upgrade VCS software to version 2.0.
Install a VCS patch.
Add systems to a running VCS cluster.
Remove systems from a running VCS cluster.
Replace systems in a running VCS cluster.

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-373


Lab: Installing VCS Patches
Student Red Student Blue

RedSG

BlueSG

Install Patch
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-374
VERITAS Cluster Server
for Solaris

Introduction
VERITAS Cluster Server
Clients
Clients

Applications/Services
Applications/Services NFS WWW FTP DB

Public
Public Network
Network

VCS
VCS
Private
Private
Network
Network

Shared
Shared Storage
Storage

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-376


VCS Features
Availability
• Monitor and restart
applications
• Set failover policies
Scalability Network
• Distribute services
Clustered
• Add systems and
Databases
storage to running
clusters
Manageability
• Use Java or Web
graphical interfaces
• Manage multiple
clusters Clustered
Web Servers
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-377
High Availability Design
HA-aware applications
• Restart capability
• Crash-tolerance
HA management software
• Site replication
• Fault detection, notification, and failover
• Storage management
• Backup and recovery
Redundant hardware
• Power supplies
• Network interface cards, hubs, switches
• Storage
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-378
VERITAS Clustering and
Replication Products
Cluster Management
VERITAS Global Cluster Manager

Application Availability Agents


Informix, Oracle, Sybase, Apache

High Availability Clustering


VERITAS Cluster Server

Data Replication
VERITAS VVR & Support for Array-Based Replication

Parallel Extensions
VERITAS Cluster Volume Manager and File System

Foundation Products
VERITAS Volume Manager and File System
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-379
VERITAS High Availability
Solutions

Global
Global
WWW Cluster
Cluster
Manager
Manager VCS
VCS
DB

VCS
VCS

WAN

VxVM
VxVM
VxFS
VxFS
VxVM
VxVM OO OO
VxFS
VxFS
Volume
Volume
OO OO
Replicator
Replicator Tokyo
London
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-380
References for High Availability
Blueprints for High Availability:
Designing Resilient Distributed Systems
by Evan Marcus and Hal Stern
High Availability Design, Techniques, and
Processes
by Floyd Piedad and Michael Hawkins
Designing Storage Area Networks
by Tom Clark
Storage Area Network Essentials: A Complete
Guide to Understanding and Implementing SANs
by Richard Barker and Paul Massiglia
VERITAS High Availability Fundamentals
Web-based training

© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-381


Course Overview

Troubleshooting

Using Volume Cluster


Manager Communication

Event Faults and Installing


Notification Failovers Applications

Service Group Preparing Resources NFS


Basics Resources and Agents Resources

Terms Managing Using


Introduction and Installing Cluster Cluster
Concepts VCS Service Manager
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-382
Lab Overview

Private Network
Red Blue
Student Student

Even/high
Odd/low train1 train2 numbered
numbered
system
system
SCSI JBOD

Public Network
© Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-383

S-ar putea să vă placă și