Documente Academic
Documente Profesional
Documente Cultură
OVERVIEW
There are a number of factors to consider when determining whether or not to include a quorum disk in your
cluster. In most cases, QDisk is unnecessary and can lead to additional configuration complexity, increasing
the likelihood that an incorrect setting might cause unexpected behavior. Red Hat recommends that you only
deploy QDisk if absolutely necessary.
This document describes the most common use cases for QDisk and how to optimally configure QDisk
settings, including heuristics and settings for multipath. For more information, see the following:
Official Product Documentation (for both Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6)
Cluster Administration
Cluster Suite Overview
man qdisk
man cman
man cluster.conf
man openais.conf
Environment
Red Hat Enterprise Linux 5+ Advanced Platform (Clustering and GFS/GFS2)
Red Hat Enterprise Linux 6+ with the High Availability Add-On or Resilient Storage Add-On
Two-Node Clusters with Separate Networks for Cluster Interconnect (Heartbeat) and
Fencing
Two-node clusters are inherently susceptible to split-brain scenarios, where each node considers itself the
only remaining member of the cluster when communication between the two is severed. Because the nodes
cannot communicate in order to agree on which node should be removed from the cluster, both will try to
remove the other via fencing. The phenomenon where both hosts attempt to fence each other
simultaneously is referred to as a fence race. In a typical environment where fence devices are accessed
on the same network that is used for cluster communication, this is not a problem because the node that has
lost its network connection will be unable to fence the other and will thus lose the race. Some shared fencing
How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux | John Ruemker and Lon Hohberger 1
devices serialize access, meaning that only one host can succeed. However, if there is one fencing device
per node and the devices are accessed over a network that is not used for cluster communication, then the
potential exists for both hosts to send the fencing request simultaneously. This results in what is called
fence death, where both nodes of the cluster are powered off.
QDisk can deal with fence races in these situations by predetermining which node should remain alive using
either a set of heuristics or an automatic master-wins mechanism. However, in Red Hat Enterprise Linux 5.6
and later and Red Hat Enterprise Linux 6.1 and later, Red Hat recommends using the delay option for
fencing agents instead. Using the delay option with a given fencing agent defines a configuration-based
winner to the fence race and is simpler to configure and use than a quorum disk.
In this configuration, QDisk can also prevent a problem known as fencing loops, which only occur in twonode clusters. Fencing loops occur when a cluster node reboots after being fenced, but cannot rejoin the
cluster because cluster intraconnect is still unavailable. It then fences the surviving node, which then
reboots, and the process repeats indefinitely. An alternative method for preventing fencing loops is to disable
the cluster software at boot using the chkconfig utility.
Clusters Requiring the Ability to Give Preference to the Node Owning a Service
When communications are severed between nodes in a cluster, you might want membership to be
determined by certain factors, such as which node is running a service. QDisk offers this through the use of
heuristics that use scripts to determine a nodes score.
How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux | John Ruemker and Lon Hohberger 2
cluster settings so that the cluster fails a node as quickly as possible, regardless of whether it can
recover by waiting longer. For these types of clusters, you do not typically need to give consideration
to how long it takes for a multipath path to fail over; the QDisk and cluster settings can be configured
normally (see the Configuration Settings for QDisk and CMAN section). If a nodes storage
misbehaves, the node will simply be evicted after waiting quorumd interval * tko seconds, and
the cluster will resume operations.
Allow Time for Recovery: In some clusters, a node failure is very costly and should be avoided until
it is clear that automatic recovery does occur quickly. In these environments, you can adjust the
configuration so that the cluster will wait a sufficient amount of time for a multipath device to fail over
to another active path before evicting a node. You must first determine the maximum amount of time
that storage devices might take to fail and thus how long a multipath failover might take.
Multipath failover time can change in each environment, and thus it cannot be easily defined without testing
different failure scenarios. In many situations, these path failures should happen quickly, avoiding any sort of
I/O delay for QDisk. However, in other circumstances where the storage target or switch becomes
unresponsive, it might take several minutes to fail over to another path, resulting in QDisk timing out while
waiting on I/O. The References and Related Documentation section lists documents that provide more
information that will help determine and/or tune how long a multipath failover might take in your environment.
Once an accurate understanding of the time it takes storage devices to fail is obtained, the settings for
QDisk can be adjusted accordingly:
If the cluster should always wait long enough for the multipath device to switch to the next active path,
then the product of quorumd settings interval * tko should be a value greater than the multipath
failover time.
<quorumd label="myqdisk" min_score="1" interval="2" tko="10">
<heuristic [...] />
</quorumd>
The totem token and cman quorum_dev_poll settings should be adjusted using the guidelines
from the Configuration Settings for QDisk and CMAN section based on these new interval and tko
settings.
NOTE: In Red Hat Enterprise Linux 5.5, the cluster failover time would increase to about 3 * token because of a
bug fix: https://bugzilla.redhat.com/show_bug.cgi?id=544482. If you set totem token to x * 2.7 (where x =
multipath failover), the cluster failover time would be around x * 2.7 * 3. In this case, the cluster will fence the
node and start the service at around x * 2.7 * 3 after a node failure. If x is 30s, the cluster failover time is 30s x
2.7 x 3 = 243s > 4 minutes. For more information, see the following article: Why does it take so long (4+
minutes) before the other node is fenced in my Red Hat Enterprise Linux 5.5 cluster?
Heuristics
Heuristics are effectively fitness checks for nodes. The total score values of all of the heuristics are added
together, and if they meet the minimum required score, the node continues to participate in the cluster. If the
total of all scores drops below the minimum score, the node removes itself from the cluster (usually by
rebooting). Below are some common heuristics and usage examples.
How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux | John Ruemker and Lon Hohberger 5
Ping
Ping is typically used to remove a cluster when it is no longer able to access a critical network resource. For
example, if client requests come in through a router with the address 192.168.2.1, you could use a ping
heuristic to check whether or not the router is pingable. If the router is not pingable, presumably any services
running locally are therefore unavailable to clients, and you might choose to remove it from the cluster.
NOTE: You should never base a cluster member's participation on the status of a single ping packet (use a
heuristic tko > 1 on Red Hat Enterprise Linux 5).
SAN Connectivity
QDisk provides some implicit storage monitoring. For example, if a host suddenly cannot access the quorum
disk, it will be removed from the cluster by another member of the cluster. However, the SAN Connectivity
heuristic monitors only the path(s) to the quorum disk. In some cases, it might be useful to have a heuristic
that monitors other devices as well.
Network Connectivity
In some environments, it might not make sense to monitor network connectivity at a lower level than ping.
Instead, you might want to use a heuristic to monitor network connectivity (for example, Ethernet or
InfiniBand link states) at a lower level by using tools shipped with Red Hat Enterprise Linux.
How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux | John Ruemker and Lon Hohberger 6
Copyright 2011 Red Hat, Inc. Red Hat, Red Hat Linux, the Red Hat Shadowman logo, and the products
listed are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Linux is the registered
trademark of Linus Torvalds in the U.S. and other countries.
www.redhat.com