Rhel Cluster Qdisk

How to Optimally Configure a Quorum Disk in Red Hat
Enterprise Linux Clustering and High-Availability

Environments
Authors: John Ruemker and Lon Hohberger
Editor: Allison Pranger
05/18/2011
OVERVIEW
There are a number of factors to consider when determining whether or not to include a quorum disk in your
cluster. In most cases, QDisk is unnecessary and can lead to additional configuration complexity, increasing
the likelihood that an incorrect setting might cause unexpected behavior. Red Hat recommends that you only
deploy QDisk if absolutely necessary.
This document describes the most common use cases for QDisk and how to optimally configure QDisk
settings, including heuristics and settings for multipath. For more information, see the following:
Official Product Documentation (for both Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6)
Cluster Administration
Cluster Suite Overview
man qdisk
man cman
man cluster.conf
man openais.conf
Environment
Red Hat Enterprise Linux 5+ Advanced Platform (Clustering and GFS/GFS2)
Red Hat Enterprise Linux 6+ with the High Availability Add-On or Resilient Storage Add-On
QDISK USE CASES

The following are examples of typical use cases for QDisk. In some cases, alternative methods for achieving
the desired results are provided.
Two-Node Clusters with Separate Networks for Cluster Interconnect (Heartbeat) and
Fencing
Two-node clusters are inherently susceptible to split-brain scenarios, where each node considers itself the
only remaining member of the cluster when communication between the two is severed. Because the nodes
cannot communicate in order to agree on which node should be removed from the cluster, both will try to
remove the other via fencing. The phenomenon where both hosts attempt to fence each other
simultaneously is referred to as a fence race. In a typical environment where fence devices are accessed
on the same network that is used for cluster communication, this is not a problem because the node that has
lost its network connection will be unable to fence the other and will thus lose the race. Some shared fencing
How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux | John Ruemker and Lon Hohberger 1
devices serialize access, meaning that only one host can succeed. However, if there is one fencing device
per node and the devices are accessed over a network that is not used for cluster communication, then the
potential exists for both hosts to send the fencing request simultaneously. This results in what is called
fence death, where both nodes of the cluster are powered off.
QDisk can deal with fence races in these situations by predetermining which node should remain alive using
either a set of heuristics or an automatic master-wins mechanism. However, in Red Hat Enterprise Linux 5.6
and later and Red Hat Enterprise Linux 6.1 and later, Red Hat recommends using the delay option for
fencing agents instead. Using the delay option with a given fencing agent defines a configuration-based
winner to the fence race and is simpler to configure and use than a quorum disk.
In this configuration, QDisk can also prevent a problem known as fencing loops, which only occur in twonode clusters. Fencing loops occur when a cluster node reboots after being fenced, but cannot rejoin the
cluster because cluster intraconnect is still unavailable. It then fences the surviving node, which then
reboots, and the process repeats indefinitely. An alternative method for preventing fencing loops is to disable
the cluster software at boot using the chkconfig utility.
Clusters Requiring "Last Man Standing" Functionality

In order to remain operational (quorate), Red Hat Enterprise Linux high-availability clusters require a certain
number of members to be present. Greater than half of the total number of votes must be present for a
cluster to have quorum. With fewer than half, the cluster will cease to function and all clustered operations
will halt until quorum is regained.
In some mission-critical environments where an outage is unacceptable, you might want the cluster to
continue to function in a degraded state even when most of the nodes are no longer members. QDisk allows
what is often called a "last man standing" configuration, where a cluster can remain quorate with less nodes
than usually required for quorum.
The primary drawback to employing this configuration is that in many cases, the sole surviving node cannot
adequately handle the load, causing degraded performance for an extended period of time.
Clusters Requiring the Ability to Give Preference to the Node Owning a Service
When communications are severed between nodes in a cluster, you might want membership to be
determined by certain factors, such as which node is running a service. QDisk offers this through the use of
heuristics that use scripts to determine a nodes score.
Clusters Requiring the Ability for Nodes to Join and Participate

Some environments might require nodes to meet special requirements before they are able to join a cluster.
These requirements can be enforced through the use of heuristics that run scripts to check whether the
criteria are met. For example, if the cluster is primarily used for highly available services, checking the
network used for service traffic may be a prerequisite for joining the cluster.
SETTING UP A QUORUM DISK

Because QDisk can introduce additional complexity into your configuration, you should take the following
requirements into consideration:
Use of a quorum disk requires concurrent, synchronous, real-time access to shared storage. Most
SAN storage arrays are acceptable. Be aware that your QDisk timings must allow for multipath failure
recovery.
Use of QDisk requires I/O fencing as QDisk is not a fencing substitute. The QDisk daemon uses the
clusters fencing mechanism to evict nodes from the cluster when they cease to update the QDisk in a
timely manner. Power fencing is recommended.
Red Hat recommends using the deadline scheduler on the LUN being used as a quorum disk.
Use of QDisk on top of distributed or replicated storage of any type is not supported.
Use of multipath requires longer cluster timeouts when used in conjunction with QDisk (see the
Configuration Settings for Multipath section for more information).
Configuration Settings for QDisk and CMAN

The basic structure of a cluster configuration (/etc/cluster/cluster.conf) using QDisk is shown
below, followed by setting recommendations.
NOTE: In Red Hat Enterprise Linux 6.1 and later, many settings are auto-calculated based on values in
cluster.conf. If you are deploying a cluster on Red Hat Enterprise Linux 6.1 or later, Red Hat recommends
focusing on perfecting token timeout rather than altering the auto-calculated settings, which include the
italicized parameters below. Also, when run in a two-node cluster with no heuristics, the master_wins property
is automatically enabled by qdiskd.
<cluster alias="mycluster" config_version="1" name="mycluster">

<clusternodes>

<clusternode name="node1.example.com" votes="1" nodeid="1">
<fence>
<method name="1">
<device name="node1-fence"/>
</method>
</fence>
</clusternode>
<clusternode name="node2.example.com" votes="1" nodeid="2">
<fence>
<method name="1">
<device name="node2-fence"/>
</method>
</fence>
</clusternode>
</clusternodes>

<cman expected_votes="3" quorum_dev_poll="21000"/>


<quorumd label="myQDisk" interval="1" tko="10" min_score="1"
votes="1">

<heuristic program="ping -c1 -w1 192.168.2.1" score="1"
interval="2" tko="4" />
</quorumd>
<totem token="21000"/>
<fencedevices>
[...]
</fencedevices>
</cluster>
When configuring these settings, take the following recommendations into consideration:
The totem token value should be greater than two times the value of the quorumd
tko*interval. In the above example, quorumd has a tko*interval of 10 seconds, and thus
totem token is set to 21000ms (21s). Be advised that the totem token value is in milliseconds while
other values, such the quorumd interval value, are in whole seconds.
CMAN's quorum_dev_poll should be equal to totem token.
Any heuristics in use should have a tko*interval less than or equal to the quorumd
interval*(tko-1). In the above example, quorumd has interval*(tko-1) of 2*(2-1)=2, and
thus a value of 2 seconds is chosen as the interval for the heuristic (tko defaults to 1).
The expected_votes value for the cman tag should be equal to the total number of cluster nodes
times 2 and minus 1.
The two_node value for the cman tag should be 0 or not present.
The votes value for the quorumd tag should be equal to the total number of cluster nodes minus 1.
Each cluster node must have one vote.
Configuration Settings for Multipath

When using QDisk to bolster quorum in a cluster, it is often useful to have device-mapper-multipath
backing the device for redundancy and to avoid single points of failure. However, you might need to adjust
some settings so that QDisk and CMAN give the multipath device a sufficient amount of time to failover to a
different path before evicting a node.
In general, there are two different strategies for configuring a cluster to use multipath devices:
Minimize Failover Time: For clusters where there are multiple nodes equally capable of running the
configured services and where resources will not incur significant delays or penalties when a node
failure occurs, you might not want to wait for a multipath device to failover. Instead, you can configure
cluster settings so that the cluster fails a node as quickly as possible, regardless of whether it can
recover by waiting longer. For these types of clusters, you do not typically need to give consideration
to how long it takes for a multipath path to fail over; the QDisk and cluster settings can be configured
normally (see the Configuration Settings for QDisk and CMAN section). If a nodes storage
misbehaves, the node will simply be evicted after waiting quorumd interval * tko seconds, and
the cluster will resume operations.
Allow Time for Recovery: In some clusters, a node failure is very costly and should be avoided until
it is clear that automatic recovery does occur quickly. In these environments, you can adjust the
configuration so that the cluster will wait a sufficient amount of time for a multipath device to fail over
to another active path before evicting a node. You must first determine the maximum amount of time
that storage devices might take to fail and thus how long a multipath failover might take.
Multipath failover time can change in each environment, and thus it cannot be easily defined without testing
different failure scenarios. In many situations, these path failures should happen quickly, avoiding any sort of
I/O delay for QDisk. However, in other circumstances where the storage target or switch becomes
unresponsive, it might take several minutes to fail over to another path, resulting in QDisk timing out while
waiting on I/O. The References and Related Documentation section lists documents that provide more
information that will help determine and/or tune how long a multipath failover might take in your environment.
Once an accurate understanding of the time it takes storage devices to fail is obtained, the settings for
QDisk can be adjusted accordingly:
If the cluster should always wait long enough for the multipath device to switch to the next active path,
then the product of quorumd settings interval * tko should be a value greater than the multipath
failover time.
<quorumd label="myqdisk" min_score="1" interval="2" tko="10">
<heuristic [...] />
</quorumd>
The totem token and cman quorum_dev_poll settings should be adjusted using the guidelines
from the Configuration Settings for QDisk and CMAN section based on these new interval and tko
settings.
NOTE: In Red Hat Enterprise Linux 5.5, the cluster failover time would increase to about 3 * token because of a
bug fix: https://bugzilla.redhat.com/show_bug.cgi?id=544482. If you set totem token to x * 2.7 (where x =
multipath failover), the cluster failover time would be around x * 2.7 * 3. In this case, the cluster will fence the
node and start the service at around x * 2.7 * 3 after a node failure. If x is 30s, the cluster failover time is 30s x
2.7 x 3 = 243s > 4 minutes. For more information, see the following article: Why does it take so long (4+
minutes) before the other node is fenced in my Red Hat Enterprise Linux 5.5 cluster?
Heuristics
Heuristics are effectively fitness checks for nodes. The total score values of all of the heuristics are added
together, and if they meet the minimum required score, the node continues to participate in the cluster. If the
total of all scores drops below the minimum score, the node removes itself from the cluster (usually by
rebooting). Below are some common heuristics and usage examples.
Ping
Ping is typically used to remove a cluster when it is no longer able to access a critical network resource. For
example, if client requests come in through a router with the address 192.168.2.1, you could use a ping
heuristic to check whether or not the router is pingable. If the router is not pingable, presumably any services
running locally are therefore unavailable to clients, and you might choose to remove it from the cluster.
NOTE: You should never base a cluster member's participation on the status of a single ping packet (use a
heuristic tko > 1 on Red Hat Enterprise Linux 5).
SAN Connectivity
QDisk provides some implicit storage monitoring. For example, if a host suddenly cannot access the quorum
disk, it will be removed from the cluster by another member of the cluster. However, the SAN Connectivity
heuristic monitors only the path(s) to the quorum disk. In some cases, it might be useful to have a heuristic
that monitors other devices as well.
Network Connectivity
In some environments, it might not make sense to monitor network connectivity at a lower level than ping.
Instead, you might want to use a heuristic to monitor network connectivity (for example, Ethernet or
InfiniBand link states) at a lower level by using tools shipped with Red Hat Enterprise Linux.
REFERENCES AND RELATED DOCUMENTATION

Architecture Review Process for Red Hat Enterprise Linux High Availability, Cluster, and GFS
Can I Change the I/O Scheduler for a Particular Disk Without the System Rebooting?
Delaying Fencing in a Two-Node Cluster
Is an EMC PowerPath-Managed LUN Supported for Use as a Quorum Device on Red Hat Enterprise
Linux Cluster and High Availability?
Red Hat Enterprise Linux Cluster, High Availability, and GFS Deployment Best Practices
Unresponsive Storage Device Leads to Excessive SCSI Recovery and device-mapper-multipath
Failover Times in Red Hat Enterprise Linux
device-mapper-multipath on Red Hat Enterprise Linux 5 Experiences Excessive Delay in Detecting a
Lost Path from a Storage Failure that Produces No RSCN or Loop/Link Error
How Can I Improve the Failover Time of a Faulty Path when Using device-mapper-multipath Over
iSCSI?
Copyright 2011 Red Hat, Inc. Red Hat, Red Hat Linux, the Red Hat Shadowman logo, and the products
listed are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Linux is the registered
trademark of Linus Torvalds in the U.S. and other countries.
www.redhat.com

Rhel Cluster Qdisk

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Rhel Cluster Qdisk

Încărcat de

Drepturi de autor:

Formate disponibile

How to Optimally Configure a Quorum Disk in Red Hat

Enterprise Linux Clustering and High-Availability

QDISK USE CASES

Clusters Requiring "Last Man Standing" Functionality

Clusters Requiring the Ability for Nodes to Join and Participate

SETTING UP A QUORUM DISK

Configuration Settings for QDisk and CMAN

<cluster alias="mycluster" config_version="1" name="mycluster">

<!-- In Red Hat Enterprise Linux 6.0 and later, master_wins is

Configuration Settings for Multipath

REFERENCES AND RELATED DOCUMENTATION

S-ar putea să vă placă și