Documente Academic
Documente Profesional
Documente Cultură
Advanced Technical
Reference Guide
9 February 2016
Classification: [Protected]
Latest Documentation
The latest version of this document is at:
http://supportcontent.checkpoint.com/solutions?id=sk93306
http://supportcontent.checkpoint.com/documentation_download?id=25321
For additional technical information, visit the Check Point Support Center.
Revision History
Date
03 Feb 2016
14 Jan 2016
27 Dec 2015
24 Nov 2015
15 Nov 2015
14 Nov 2015
07 Sep 2015
17 Aug 2015
06 Aug 2015
28 July 2015
23 July 2015
12 June 2015
30 Mar 2015
02 Mar 2015
10 Feb 2015
05 Feb 2015
02 Feb 2015
21 Dec 2014
28 Oct 2014
26 Oct 2014
01 Oct 2014
03 Sep 2014
26 Aug 2014
05 Aug 2014
Description
30 July 2014
29 July 2014
01 July 2014
30 June 2014
17 Mar 2014
09 Feb 2014
21 Jan 2014
19 Dec 2013
15 Dec 2013
08 Oct 2013
29 Sep 2013
22 Sep 2013
09 Sep 2013
15 Aug 2013
29 July 2013
18 July 2013
16 July 2013
10 July 2013
03 July 2013
Table of Contents:
Introduction to ClusterXL ...................................................................................................... 6
The need for gateway clusters ........................................................................................................................6
Check Point cluster solution ............................................................................................................................6
ClusterXL definitions and terms .......................................................................................................................7
ClusterXL requirements for hardware and software ......................................................................................18
Introduction to ClusterXL
The need for gateway clusters
Gateways and VPN connections are business critical devices. The failure of a Security
Gateway or VPN connection can result in the loss of active connections and access to
critical data. The gateway between the organization and the world must remain open under
all circumstances.
(e.g., pre-requisite for configuring Bond interface). This interface state appears in the output
of 'cphaprob -a if' command.
Full Sync - Complete synchronization of relevant kernel tables by a cluster member that
tries to join the cluster against the working cluster member(s). This process is meant to
fetchasnapshotoftherelevant kernel tables of already Active cluster member(s).
Full Sync is performed during initialization of Check Point software (during boot process,
the first time the member runs policy installation, during 'cpstart'). Until the Full Sync
process is complete successfully, this member remains in 'Down' state because until it is
fully synchronized with other cluster members, it can not function as a cluster member.
Meanwhile the Delta Sync packets continue to arrive and are stored in kernel memory
until Full Sync completes.
The whole Full Sync process is performed by FWD daemons on TCP port 256 and is
always done over.
The information is sent by FWD daemons in chunks while making sure they confirm
getting the information before sending the next chunk.
Delta Sync - Synchronization of kernel tables between all working cluster members exchange of CCP packets that carry pieces of information about different connections and
operations that should be performed on these connections in relevant kernel tables.
This Delta Sync process is performed directly by Check Point kernel.
While performing Full Sync, the Delta Sync updates are not processed and saved in
kernel memory. After Full Sync is complete, the Delta Sync packets stored during the Full
Sync phase are applied by order of arrival.
Delta Sync retransmission - It is possible that Delta Sync packets will be lost or corrupted
during the Delta Sync operations. In such cases, it is required to make sure the Delta Sync
packet is re-sent.The cluster member request the sending member to retransmit the
lost/corrupted Delta Sync packet.
Each Delta Sync packet has a sequence number.
The sending member has a queue of sent Delta Sync packets.
Each cluster member has a queue of packets sent from each of the peer cluster
members.
If, for any reason, a Delta Sync packet was not received by a cluster member, it can ask
for a retransmission of this packet from the sending member.
The Delta Sync retransmission mechanism is somewhat similar to a TCP Window and
TCP retransmission mechanism.
When a member requests retransmission of Delta Sync packet, which no longer exists
on the sending member, the member prints a console messages that the sync is not
complete.
Cluster Control Protocol (CCP) - Proprietary Check Point protocol that runs between
cluster members on UDP port 8116, and has the following roles (refer to 'Cluster Control
Protocol (CCP) section):
State Synchronization (Delta Sync)
Health checks (state of cluster members and of cluster interfaces):
o Health-status Reports
o Cluster-member Probing
o State-change Commands
o Querying for Cluster Membership
Note: CCP is located between the Check Point kernel and the network interface
(therefore, only TCPdump should be used for capturing this traffic).
Preconfigured mode - Cluster Mode, where cluster membership is enabled on all
members to be, however no policy had been yet installed on any of the members - none of
them is actually configured to be primary, secondary, etc. The cluster cannot function if one
machinefails.Inthisscenario,thepreconfigured mode takes place.
The preconfigured mode also comes into effect when no policy is yet installed, right
after the machines came up after boot, or when running 'cphaconf init' command.
Blocking mode - Cluster Mode, where cluster member does not forward any traffic (e.g.,
caused by a failure).
Non-blocking mode - Cluster Mode, where cluster member keeps forwarding all traffic.
High Availability (a.k.a. Active/Standby) mode - Cluster Mode, where only one cluster
member ('Active' member) processes all the traffic, while other cluster members ('Standby'
members) are ready to be promoted to 'Active' state if 'Active' member fails.
In High Availability New Mode, the cluster Virtual IP address (that represents the cluster
on that network) is associated:
with physical MAC Address of 'Active' member
with virtual MAC Address (refer to sk50840 (How to enable ClusterXL Virtual MAC
(VMAC) mode))
In High Availability Legacy (Traditional) Mode, there are no Virtual IP addresses - the
cluster members share identical IP and MAC addresses, so that the Active cluster member
receives from a hub or switch all the packets that were sent to the cluster IP address.
Load Sharing (a.k.a. Active/Active, Load Balancing) mode - Cluster Mode, where all
traffic is processed by all cluster members in parallel.
Load Sharing Multicast mode - Load Sharing Cluster Mode, where all traffic is processed
by all cluster members in parallel - each member is assigned the equal load of [ 100% /
number_of_members ].
The cluster Virtual IP address (that represents the cluster on that network) is associated
with Multicast MAC Address 01:00:5E:X:Y:Z (which is generated based on last 3 bytes
of cluster Virtual IP address on that network).
A ClusterXL decision algorithm (Decision Function) on all cluster members decides
which cluster member should process the given packet.
Load Sharing Unicast mode - Load Sharing Cluster Mode, where all traffic is accepted by
one member (called Pivot), and then the traffic is either processed by this member (Pivot),
or forwarded to one of the peer members (called non-Pivot).
The traffic load is assigned to cluster members based on the hard-coded formula per
the value of 'Pivot_overhead' attribute (refer to sk34668 (How to modify the assigned
load between the members of ClusterXL in Load Sharing Unicast mode)).
The cluster Virtual IP address (that represents the cluster on that network) is associated
with:
Physical MAC Address of 'Pivot' member
Virtual MAC Address (refer to sk50840 (How to enable ClusterXL Virtual MAC
(VMAC) mode))
Full High Availability (a.k.a. Full HA) mode - Special Cluster Mode (supported only on
Check Point appliances running Gaia OS or SecurePlatform OS) where each cluster
member also runs as a Security Management Server. This provides redundancy both
between Security Gateways (only High Availability is supported) and between Security
Management Servers (only High Availability is supported). Refer to sk101539 (ClusterXL
Load Sharing mode limitations and important notes) and sk39345 (Management High
Availability restrictions).
Decision Function - Special cluster algorithm applied by each cluster member on the
incoming traffic in order to decide, which member should process the given packet - each
cluster members maintains a table of hash values generated based on connections tuple
(source and destination IP addresses/Ports, and Protocol number).
In order to see the decision process, run kernel debug of 'cluster' module with flag
'df' (also recommended to enable the flag 'select').
Sticky Decision Function (SDF) - Special cluster algorithm in Load Sharing mode that
allows the user to control based on which parameters should the Decision Function be
applied to the incoming connections:
IPs, Ports, SPIs
IPs, Ports
IPs
Selection - The packet selection mechanism is one of the central and most important
components in the ClusterXL product and State Synchronization infrastructure for 3rd party
clustering solutions. Its main purpose is to correctly decide (select) what has to be done to
the incoming and outgoing traffic on the cluster machine.
In order to see the selection process, run kernel debug of 'cluster' module with flag
'select' (also recommended to enable the flag 'df').
In ClusterXL - the packet is selected by cluster member(s) depending on the cluster
mode:
o In HA modes - by Active member
o In LS Unicast mode - by Pivot member
o In LS Multicast mode - by all members.
Then the member applies the Decision Function (and SDF).
In 3rd party / OPSec cluster - the 3rd party software selects the packet, and Check
Point code just inspects it (and performs State Synchronization).
HA not started - Output of 'cphaprob flag' command on the given cluster member means that Check Point clustering software is not started on this Security Gateway (e.g.,
this machine is not a part of a cluster, or 'cphastop' command was run, or some failure
occurred that prevented the ClusterXL product from starting correctly).
Initializing - State of a cluster member during initialization of Check Point software (this
state can be seen only in cluster debug). An initial and transient state of the cluster member
- the ClusterXL product is already running, but not all ClusterXL Critical Devices are
initialized yet and FireWall product is not ready yet.
Ready - State of a cluster member during after initialization and before promotion to the
next required state - Active/Standby/Master/Backup (depending on Cluster Mode).
A member in this state does not process any traffic passing through cluster. A member
might be stuck in this state due to several reasons - refer to sk42096 (Cluster member is
stuck in 'Ready' state).
Active - State of a cluster member that is fully operational:
In ClusterXL - state of the Security Gateway component
In 3rd party / OPSec cluster - state of the State Synchronization mechanism
Active attention - In ClusterXL - state of the 'Active' cluster member that suffers from a
failure (and failover is not possible because there are no other available members, e.g.,
while Standby member of an HA cluster reboots).
Standby - State of a cluster member that is ready to be promoted to 'Active' state (if Active
member fails) in ClusterXL configured in High Availability mode.
Master - State of a cluster member that processes all traffic in ClusterXL configured in
VRRP mode.
Backup - State of a cluster member that is ready to be promoted to 'Master' state (if Master
member fails) in ClusterXL configured in VRRP mode.
Active Up - ClusterXL in High Availability mode that was configured as 'Maintain
current active Cluster Member'.
This means the following:
If the current Active member fails for some reason, or is rebooted (e.g., Member_A),
then failover occurs between cluster members - another Standby member will be
promoted to be Active (e.g., Member_B).
When former Active member (Member_A) recovers from a failure, or boots, the
former Standby member (Member_B) will remain to be in Active state (and
Member_A will assume the Standby state).
Primary Up - ClusterXL in High Availability mode that was configured as 'Switch to
higher priority Cluster Member'.
This means the following:
Each cluster member is given a priority (SmartDashboard - cluster object - 'Cluster
Members' pane) - member with highest priority appears at the top of the table, and
member with lowest priority appears at the bottom of the table.
The member with highest priority will assume the Active state.
If the current Active member with highest priority (e.g., Member_A), fails for some
reason, or is rebooted, then failover occurs between cluster members - the member
next highest priority will be promoted to be Active (e.g., Member_B).
When the member with highest priority (Member_A) recovers from a failure, or
boots, then additional failover occurs between cluster members - the member with
highest priority (Member_A) will be promoted to Active state (and Member_B will
return to Standby state).
Down - State of a cluster member during a failure:
In ClusterXL - state of the Security Gateway component
In 3rd party / OPSec cluster - state of the State Synchronization mechanism
Dead - State reported by a cluster member when it goes out of the cluster (due to
'cphastop' command (which is a part of 'cpstop'), or reboot).
Dying - State of a cluster member as assumed by peer members if it did not report its state
for 0.7 sec.
ClusterXL is inactive, or the machine is down - Such state is reported by the given
member regarding the peer member after the peer member notifies (via CCP) that it goes
out of the cluster (due to 'cphastop' command (which is a part of 'cpstop'), or reboot).
Critical Device (a.k.a. Problem Notification, Pnote) - Special software device on each
cluster member through which the critical aspects for cluster operation are monitored.
When the critical monitored component on a cluster member fails to report its state on
time, or when its state is reported as problematic, the state of that member is immediately
changed to 'Down'
The complete list of the configured critical devices (pnotes) is printed by the 'cphaprob
-ia list' command.
Restrictions:
Total number of critical devices (pnotes) on cluster member is limited to 16.
Name of any critical device (pnote) on cluster member is limited to 16 characters.
There are several predefined built-in critical devices (pnotes):
Additional critical devices (pnotes) can be registered by using Check Point shell scripts:
'$FWDIR/bin/clusterXL_admin' shell script registers the admin_down device
(sk55081)
'$FWDIR/bin/clusterXL_monitor_ips' shell script registers the
host_monitor device (sk35780)
'$FWDIR/bin/clusterXL_monitor_process' shell script registers devices with
the names of processes that are specified in the
$FWDIR/conf/cpha_proc_list file (sk92904)
Additional critical devices (pnotes) can be registered by using the following syntax:
cphaprob -d Device_Name -t TimeOut_in_Sec -s State [-p]
register
Important Note: For R76 and above, refer to sk92878 (User Space process monitoring
mechanism in R76 ClusterXL).
Note: On Security Gateway in VSX mode, global pnotes can be registered only from the
context of VS0.
Any critical device (pnote) can be unregistered by using the following syntax:
cphaprob -d Device_Name [-p] unregister
Note: On Security Gateway in VSX mode, global pnotes can be unregistered only
from the context of VS0.
Subscribers - User Space processes that are made aware of the current state of the
ClusterXL state machine and other clustering configuration parameters. List of such
subscribers can be obtained by running the cphaconf debug_data command.
Sticky connection - A connection is called 'sticky' if all packets are handled by a single
cluster member (in High Availability mode, all packets reach the 'Active' machine, so all
connections are sticky).
Non-sticky connection - A connection is called 'non-sticky' if the reply packet returns via a
different cluster member than the original packet (e.g., if network administrator has
configured asymmetric routing; in Load Sharing mode, all cluster members are 'Active', and
in Static NAT and encrypted connections, the Source and Destination IP addresses
change, therefore, Static NAT and encrypted connections through a Load Sharing cluster
may be non-sticky).
Flush and ACK (a.k.a. FnA, F&A) - Cluster member forces the Delta Sync packet about
the incoming packet and waiting for acknowledgements from all other Active members and
only then allows the incoming packet to pass through.
In some scenarios, it is required that some information, written into the kernel tables, will
be Sync-ed promptly, or else a race condition can occur. The race condition may occur if a
packet that caused a certain change in kernel tables left cluster Member_A toward its
destination and then the return packet tries to go through cluster Member_B.
In general, this kind of situation is called asymmetric routing. What may happen in this
scenario is that the return packet arrives at cluster Member_B before the changes induced
by this packet were Sync-ed to this Member_B.
Example of such a case is when a SYN packet goes through cluster Member_A,
causing multiple changes in the kernel tables and then leaves to a server. The SYN-ACK
packet from a server arrives at cluster Member_B, but the connection itself was not Synced yet. In this condition, the cluster Member_B will drop the packet as an Out-of-State
packet ("First packet isn't SYN"). In order to prevent such conditions, it is possible
tousetheFlushandAck(F&A)mechanism.
This mechanism can send the Delta Sync packets with all the changes accumulated so
far in the Sync buffer to the other cluster members, hold the original packet that induced
these changes and wait for acknowledgement from all other (Active) cluster members that
they received the information in the Delta Sync packet. When all acknowledgements
arrived, the mechanism will release the held original packet.
This ensures that by the time the return packet arrived from a server at the cluster, all
the cluster members are aware of the connection.
F&A is being operated at the end of the Inbound chain and at the end of the Outbound
chain (it is more common at the Outbound).
Forwarding - Process of transferring of an incoming traffic from one cluster member to
another cluster member for processing.
There are two types of forwarding the incoming traffic between cluster members:
Packet forwarding
Chain forwarding
Refer to Forwarding section.
Packet Selection - Distinguishing between different kinds of packets coming from the
network, and selecting, which member should handle a specific packet (Decision Function
mechanism):
CCP packet from another member of this cluster
CCP packet from another cluster or from a cluster member with another version
(usually older version of CCP)
Packet is destined directly to this member
Packet is destined to another member of this cluster
Packet is intended to pass through this cluster member
ARP packets
CPHA - General term that stands for Check Point High Availability (historic fact: the first
release of ClusterXL supported only High Availability) that is used only for internal
references (e.g., inside kernel debug) to designate ClusterXL infrastructure.
Probing - If a cluster member fails to receive status for another member (does not receive
CCP packets from that member) on a given segment, cluster member will probe that
segment in an attempt to illicit a response.
The purpose of such probes is to detect the nature of possible interface failures, and to
determine which module has the problem.
The outcome of this probe will determine what action is taken next (change the state of
an interface, or of a cluster member).
Refer to Cluster Control Protocol (CCP) section.
IP tracking - Collecting and saving of Source IP addresses and Source MAC addresses
from incoming IP packets during the probing.
This information is saved in IP tracking tables according to IP tracking policy:
host_ip_addrs_all, id 8125
host_ip_addrs, id 8177
IP tracking is a useful for members within a cluster to determine whether the network
connectivity of the member is acceptable.
IP tracking policy - Setting that controls, which IP addresses should be tracked during IP
tracking:
Only IP addresses from the subnet of cluster VIP, or from subnet of physical cluster
interface (fwha_track_ip_policy=1; default value)
All IP addresses, also outside the cluster subnet (fwha_track_ip_policy=0)
Pingable host - Some host (i.e., some IP address) that cluster members can ping during
probing mechanism. Pinging hosts in an interface's subnet is one of the health checks that
ClusterXL mechanism performs. This pingable host will allow the cluster members to
determine with more precision what has failed (which interface on which member).
On Sync network, usually, there are no hosts. In such case, if switch supports this, an
IP address should be assigned on the switch (e.g., in the relevant VLAN).
The IP address of such pingable host should be assigned per this formula:
IP_of_pingable_host = IP_of_physical_interface_on_member + ~10
Assigning the IP address to pingable host that is higher than the IP addresses of
physical interfaces on the cluster members will give some time to cluster members to
perform the default health checks.
Example:
IP address of physical interface on a given subnet on Member_A is 10.20.30.41
IP address of physical interface on a given subnet on Member_B is 10.20.30.42
IP address of pingable host should be at least 10.20.30.50
Flapping - Consequent changes in the state of either cluster interfaces (cluster interface
flapping), or cluster members (cluster member flapping). Such consequent changes in the
state are seen in SmartView Tracker (if in SmartDashboard in cluster object, the cluster
administrator set 'Track changes in the status of cluster members' to 'Log').
VMAC - Virtual MAC address (available since R71). When this feature is enabled on cluster
members, all cluster members in High Availability New mode / Load Sharing Unicast mode
(Note: any VSX cluster works in High Availability mode) associate the same Virtual MAC
address with Virtual IP address.
This allows avoiding issues when Gratuitous ARP packets sent by cluster during failover
are not integrated into ARP cache table on switches surrounding the cluster.
Refer to sk50840 (How to enable ClusterXL Virtual MAC (VMAC) mode).
HTU - Stands for "HA Time Unit". All internal time in ClusterXL is measured in HTUs (the
times in cluster debug also appear in HTUs).
Formula in the code:
1 HTU = 10 x fwha_timer_base_res = 10 x 10 milliseconds = 100 ms
Router Setting
Unicast MAC
Explanation
By default, ClusterXL does not support IGMP registration (also
known as IGMP Snooping).
Either disable IGMP registration in switches that rely on IGMP
packets to configure their ports, or enable IGMP registration on
ClusterXL members per sk33221.
In situations, where disabling IGMP registration in switches is
not acceptable, it is necessary to configure static CAMs in order
to allow multicast traffic on specific ports.
Certain switches have an upper limit on the number of
broadcasts and multicasts that they can pass, in order to
prevent broadcast storms. This limit is usually a percentage of
the total interface bandwidth.
It is possible either to turn off broadcast storm control, or to
allow a higher level of broadcasts or multicasts through the
switch.
If the connecting switch is incapable of having any of these
settings configured, it is possible, though less efficient, for the
switch to use broadcast to forward traffic, and to configure the
cluster members to run CCP in broadcast mode per sk20576.
Explanation
When working in High Availability New mode (without VMAC) /
Load Sharing Unicast mode, the Cluster Virtual IP address is
mapped to a physical MAC address of the 'Active' / 'Pivot'
member.
In case of fail-over, another member will be promoted to 'Active'
/ 'Pivot'. As a result, the Cluster Virtual IP address will be
mapped to new physical MAC address.
In order to update the surrounding networking devices, 'Active' /
'Pivot' member sends Gratuitous ARP packets.
The router needs to be able to learn this MAC through these
ARP packets (otherwise, it will route the traffic to "old" MAC
address, which will cause traffic outage on the network).
Router Setting
Static MAC
Disabling forwarding
multicast traffic to the
router
Explanation
Multicast mode is the default Cluster Control Protocol mode in
Load Sharing Multicast.
ClusterXL does not support the use of unicast MAC addresses
with Port Mirroring for Multicast Load Sharing solutions.
Explanation
Most routers can map the following ARP entries automatically
using the ARP mechanism:
unicast Layer 3 IP address
multicast Layer 2 MAC address
If you have a router that is not able to learn this type of
mapping dynamically, you will have to configure these
mappings as static MAC entries.
Some routers require disabling of IGMP snooping or
configuration of static CAMs in order to support sending
packets with unicast Layer 3 IP address and multicast Layer 2
MAC address.
Certain routers have an upper limit on the number of
broadcasts and multicasts that they can pass, in order to
prevent broadcast storms. This limit is usually a percentage of
the total interface bandwidth.
It is possible either to turn off broadcast storm control, or to
allow a higher level of broadcasts or multicasts through the
router.
Some routers will send multicast traffic to the router itself. This
may cause a packet storm through the network, and should be
disabled.
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Example Configuration of a Cisco Catalyst Routing Switch.
Refer to these solutions related to IGMP snooping:
sk31934 (ClusterXL IGMP Membership)
sk33221 (Using ClusterXL with IGMP Snooping-enabled switches)
sk22495 (Interface flapping (down/up) in a ClusterXL environment)
sk93327 (IGMP groups are not learned on cluster member)
ClusterXL licenses
To use ClusterXL, each Security Gateway in the cluster configuration must have a
regular Security Gateway license and the Security Management Server must have a
license for each cluster defined. There are separate licenses for cluster High Availability
mode and for cluster Load Sharing mode.
It does not matter how many Security Gateways are included in the cluster. If the proper
licenses are not installed, the policy installation operation will fail.
Refer to these solutions:
sk11054 (Check Point License Guide)
sk10200 ('too many internal hosts' error in /var/log/messages on Security Gateway)
For assistance with licenses, contact Check Point Customer Account Services
(http://www.checkpoint.com/form/contact_account.html, AccountServices@checkpoint.com,
+1-972-444-6600 ext 5).
ClusterXL supplies an infrastructure that ensures that no data is lost in case of a failure,
by making sure each cluster member is aware of the connections going through the other
members. Passing information about connections (stored in various Check Point kernel
tabled) and other Security Gateway states between the cluster members is called State
Synchronization.
Every IP-based service (including ICMP, TCP and UDP) recognized by the Security
Gateway is synchronized (unless configured otherwise in SmartDashboard).
State Synchronization is used both by ClusterXL and by 3rd party OPSEC-certified
clustering products.
ClusterXL modes and state synchronization:
ClusterXL High Availability configuration does not require state synchronization,
though if it is not enabled, connections will be lost upon failover.
ClusterXL Load Sharing configuration requires state synchronization (it is enabled
automatically and can not be disabled).
Full Sync is performed during initialization of Check Point software (during boot
process, the first time the member runs policy installation, during 'cpstart'). Until the
Full Sync process is complete successfully, this member remains in 'Down' state
because until it is fully synchronized with other cluster members, it can not function as a
cluster member.
Meanwhile the Delta Sync packets continue to arrive and are stored in kernel
memory until Full Sync completes.
The whole Full Sync process is performed by FWD daemons on TCP port 256 and is
always done over SIC (the information is written into relevant kernel tables via IOCTL):
o The member that tries to join the cluster starts to serve as Full Sync Client.
$FWDIR/log/fwd.elg log file shows:
fwsync: Connected to Sync Server
Decimal_IP_Address_of_Peer_Member. Starting full sync
fwsync: Full sync connection finished successfully
fwsync: End Sync Connection successfully
o A member chosen for Full Sync starts to serve as Full Sync Server.
$FWDIR/log/fwd.elg log file shows:
fwd_syncn_handler: got new full sync connection request from peer
Hex_IP_Address_of_Peer_Member
The information is sent by FWD daemons in chunks while making sure they confirm
getting the information before sending the next chunk.
Delta Sync - Synchronization of kernel tables between all working cluster members
- exchange of CCP packets that carry pieces of information about different
connections and operations that should be performed on these connections in
relevant kernel tables.
This Delta Sync process is performed directly by Check Point kernel.
While performing Full Sync, the Delta Sync updates are not processed and saved in
kernel memory. After Full Sync is complete, the Delta Sync packets stored during the
Full Sync phase are applied by order of arrival.
Whenever an operation is performed on a kernel table, which is marked as "sync"-ed
(in $FWDIR/conf/table.def file on Security Management Server), the Delta Sync
mechanism duplicates this action into a buffer of its own.
Once this Delta Sync buffer is full, and every Sync timer interval, the Delta Sync
buffer is sent to all cluster members over the Synchronization Network. The receiving
member will duplicate those actions into its kernel tables.
Restrictions
1. Refer to ClusterXL Requirements for Hardware and Software section above.
2. State synchronization is supported only between cluster members that meet the
following requirements:
identical operating systems
identical Check Point software components
latency on synchronization network is less than ~30 milliseconds and packet loss
is less than ~2-3%
Note: There is no requirement for throughput of Sync interface to be identical to
/ larger than throughput of traffic interfaces.
In addition, some connections can not be synchronized by design:
Connections that use User Authentication can not be synchronized (because
user authentication state is maintained on Security Servers, which are User
Space processes, and thus cannot be synchronized on different machines in the
way that kernel data can be synchronized).
Connections that use Resources can not be synchronized (because state of
such connections is maintained on Security Servers, which are User Space
processes, and thus cannot be synchronized on different machines in the way
that kernel data can be synchronized).
Accounting information can not be synchronized (because it is accumulated in
each cluster member and reported separately to the Security Management
Server, where the information is aggregated).
Broadcasts and multicasts can not be synchronized by design.
When DHCP Server is enabled on cluster members, the DHCP Server lease
database is not synchronized by design.
In R6x versions, Web Intelligence features on a ClusterXL cluster do not survive
failover. This means that if ClusterXL is providing Web Intelligence protections
and a cluster member fails, HTTP connections passing through the failed
member are lost.
Refer to sk92909 (How to debug ClusterXL to understand why a connection is not
synchronized).
Synchronization network
A set of interfaces on cluster members that were configured as interfaces, over which
State Synchronization information will be passed (as Delta Sync packets) comprise the
Synchronization Network.
All Synchronization Networks work in parallel - i.e., the same information is passed in
parallel over all configured Synchronization Networks.
Up to three Synchronization Networks can be configured per cluster (SmartDashboard cluster object - 'Topology' pane - 'Network Objective').
Important Notes:
1. The use of more than one Synchronization Network for redundancy is not supported
because the CPU load will increase significantly due to duplicate tasks performed by
all configured Synchronization Networks.
If a redundancy of Synchronization Networks is required, Check Point recommends
using Link Aggregation - configure several physical interfaces as a Bond interface,
and then configure single dedicated Synchronization Network over this single Bond
interface.
Refer to Link Aggregation (Bonding) section.
Refer to sk92804 (Sync Redundancy in ClusterXL).
2. State Synchronization information (payload of Delta Sync packets) is not encrypted.
It is up to cluster administrator to make sure that the Sync network is secured and
isolated.
ClusterXL Modes
Note:
Up to 8 cluster members are supported in ClusterXL (in Load Sharing mode,
configuring more than 4 members significantly decreases cluster performance due to
amount of Delta Sync).
Up to 5 cluster members are supported 3rd party cluster (in Crossbeam chassis,
configuring more than 4 members (APMs) significantly decreases cluster
performance due to amount of Delta Sync).
Note: In Crossbeam DBHA configuration, the above requirement applies to a single
chassis (Check Point code is not aware of DBHA).
This limitation exists due to the high load on CPU that is caused by the amount of Delta
Sync packets, which increases significantly with the number of cluster members (the whole
cluster might suffocate, depending on the production traffic, of course).
Mode
Feature
High Availability
Load Sharing
Assigned Traffic
Load per Member
Performance
HW Support
SecureXL
Support
State
Synchronization is
Mandatory
VLAN Tagging
Support
Number of
members that
deal with network
traffic
Number of
members that
receive packets
from router
How cluster
answers ARP
requests for a VIP
address 4
CCP mode (also
refer to sk36644)
High
High
Availability Availability
New
Legacy
Mode
Mode 1
Yes
Yes
No
No
VRRP
Yes
No
Load
Sharing
Multicast
Mode 2
No
Yes
Load
Sharing
Unicast
Mode 2
No
Yes
100%
100%
100%
100% / N
sk34668
Good
Good
Good
Very Good
All
All
All
Excellent
Not all
routers 3
Yes
Yes
Yes
Yes 2
Yes 2
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Unicast
MAC
address of
Active
Multicast /
Broadcast
Unicast
shared
MAC
address
Broadcast
only
Virtual
VRRP MAC
address
Multicast
MAC
address 5
Multicast /
Broadcast
Multicast /
Broadcast
All
Unicast
MAC
address of
Pivot
Multicast /
Broadcast
Notes:
1.
2.
3.
4.
Open SmartDashboard
Open cluster object
Go to 'Topology' pane - click on 'Edit...'
Select the relevant VIP interface - click on 'Edit...'
On 'General' tab, click on 'Advanced...'
If you need to change this address, then
select 'User defined:' and enter new Multicast MAC address
G. Click 'OK' in all windows to apply the changes
H. Save the changes: go to 'File' menu - click on 'Save'
I. Install policy
Automatic algorithm for generating a Multicast MAC address to be associated with
cluster Virtual IP address of a format "A"."B"."C"."D":
o If 2nd octet "B" < 127, then
Final MAC = 01:00:5E:("B"hex):("C"hex):("D"hex)
Example:
VIP = 192.50.204.20
Final MAC = 01:00:5E:("50"hex):("204"hex):("20"hex) =
= 01:00:5E:32:CC:14
o If 2nd octet "B" > 127, then
Final MAC = 01:00:5E:("B-128"hex):("C"hex):("D"hex)
Example:
VIP = 192.168.204.20
Final MAC = 01:00:5E:("168-128"hex):("204"hex):("20"hex) =
= 01:00:5E:28:CC:14
Refer to sk25977 (Connecting multiple clusters to the same network segment (same
VLAN, same switch) - section about the "Destination MAC address" of the Cluster
Control Protocol.
Whenever the cluster detects a problem in the Active member that is severe enough to
cause a failover event, it passes the role of the Active member to one of the Standby
machines (the member with the next highest priority).
If State Synchronization is applied, any open connections are recognized by the new
Active machine, and are handled according to their last known state.
Upon the recovery of a member with a higher priority, the role of the Active machine
may or may not be switched back to that member, depending on the cluster configuration.
The members of Full HA cluster can be configured either together (both Check Point
appliances are linked before the First Time Configuration Wizard is opened), or
separately (the user chooses to configure a cluster consisting of a single, Primary
member, and configure the Secondary member at a later time).
Even if you decide not to install a Secondary cluster member during the initial
configuration, it is still worth your while to configure a cluster composed of a single
Primary member. A Full HA cluster is visible to the external network through its
Virtual IP addresses, not the actual physical addresses of its members. If at some
point you do decide to add a Secondary member, you will not have to alter the Layer
3 topology of your networks.
2. In SmartDashboard
A. Create a cluster object and select High Availability Legacy mode
B. Add members' objects - assign IP address from dedicated synchronization
network of the cluster, or from a dedicated management network
C. Initialize SIC
D. Define Topology:
No Virtual IP addresses for shared interfaces
Network Objective for shared interfaces is 'Monitored Private'
Sync interfaces, Network Objective for dedicated management interfaces
is 'Monitored Private' or 'Non-Monitored Private'
E. Install policy
F. Reboot the machines - MAC Address configuration will take place
Example:
If any individual Check Point Security Gateway in the cluster becomes unreachable,
transparent failover occurs to the other machines, thus providing High Availability. All
connections are shared between the remaining Security Gateways without interruption.
Load Sharing Multicast mode uses unique, real IP addresses for the cluster members
interfaces. The cluster Virtual IP addresses are associated with the Multicast MAC address
(created based on the Virtual IP addresses).
The cluster members physical IP addresses do not have to be routable on the Internet.
Only the cluster Virtual IP addresses must be routable.
ClusterXL uses the Ethernet Multicast mechanism to associate the cluster Virtual IP
addresses with all cluster members. By binding these Virtual IP addresses to Multicast
MAC addresses, it ensures that all packets sent to the cluster, acting as a gateway, will
reach all members in the cluster.
Distribution of the traffic between cluster members is performed by applying a Decision
Function to each packet - each member decides whether it should or should not process
the packets.
This decision is the core of the Load Sharing mechanism: it has to assure that at least
one member will process each packet (so that traffic is not blocked), and that no two
members will handle the same packets (so that traffic is not duplicated).
If it is required that specific connections are always processed by particular member,
then additional decision algorithm can be enabled - Sticky Decision Function.
Refer to sk101539 (ClusterXL Load Sharing mode limitations and important notes).
The cluster members physical IP addresses do not have to be routable on the Internet.
Only the cluster Virtual IP addresses must be routable.
Distribution of the traffic by Pivot member is performed by applying a Decision Function
to each packet, the same way it is done in Load Sharing Multicast mode. The difference is
that only one member (Pivot) performs this selection: any non-Pivot member that receives
a forwarded packet will handle it, without applying the Decision Function.
If it is required that specific connections are always processed by particular member,
then additional decision algorithm can be enabled - Sticky Decision Function.
Note that non-PivotmembersarestillconsideredasActive,sincetheyperformrouting
and Firewall tasks on their share of the traffic (although they do not perform decisions).
Default traffic load assignment:
Cluster size
% of traffic handled
(including Pivot)
by the Pivot
1
100
2
33
3
20
4
10
5
0
6 and more
0
% of traffic handled by
each of the other members
0
67
40
30
20
100 / cluster size
VRRP
Refer to Gaia Administration Guide (R75.40, R75.40VS, R76, R77) - Chapter 'High
Availability'.
Virtual Router Redundancy Protocol (VRRP, RFC 3768) provides dynamic failover of IP
addresses from one router (Master) to another router (one of the Backup routers) in the
event of failure. VRRP allows you to provide alternate router paths for end hosts.
The Check Point VRRP implementation on Gaia OS includes functionality called
Monitored Circuit VRRP. Monitored Circuit VRRP prevents connection issues caused by
asymmetric routes created when only one interface on Master router fails (as opposed to
the Master itself).
Each VRRP cluster, known as a Virtual Router, has a unique identifier, known as the
VRID (Virtual Router Identifier). A Virtual Router can have one or more virtual IP addresses
(VIP) to which other network nodes connect as a final destination or the next hop in a route.
By assigning a Virtual IP address (VIP), you can define alternate paths for nodes
configured with static default routes. Only the Master router is assigned a VIP. The Backup
router is assigned a VIP upon failover when it becomes the new Master. Nodes can have
alternate paths with static default routes in the event of a failure.
Static default routes minimize configuration and processing overhead on host
computers.
Important Note: You cannot have a standalone deployment (Security Gateway and
Security Management server on the same computer) in a Gaia VRRP cluster.
Refer to these solutions:
sk70380 (Gaia FAQ - Frequently Asked Questions)
sk69684 (Using VRRP with Check Point 2012 Security Appliances)
sk92061 (How to configure VRRP on Gaia)
sk66569 (IPSO-to-Gaia Upgrade Scripts and VRRP Cluster Upgrade Instructions)
sk86881 (Changing the High Availability configuration from ClusterXL and VRRP (or
from VRRP to ClusterXL) requires reboot)
sk40278 (VRRP configuration is not updated when the logical interface information
(IP address) is changed)
sk92880 (It is not possible to configure preempt in Simplified VRRP on IPSO and
Gaia)
sk89980 (Sub-interfaces / Alias IP address / Secondary IP address on Gaia)
Bridge
ClusterXL in Bridge Mode is supported only in R75.40VS / R76 / R77 and above.
Refer to sk101371 (Bridge Mode on Gaia OS and SecurePlatform OS).
IPs, Ports, SPIs (default) - provides the best sharing distribution, and is
recommended for use.
It is the least "sticky" sharing configuration.
Clarification:
A connection will stick to a cluster member based on IP addresses and based on
Ports.
Example:
Connection from IP_1:Port_1 to IP_2:Port_2 will stick to Member_A. Connection
from IP_1:Port_2 to IP_2:Port_2 might stick to Member_B.
IPs, Ports - should be used only if problems arise when distributing IPSec packets
to a few machines although they have the same source and destination IP
addresses.
IPs - should be used only if problems arise when distributing IPSec packets or
different port packets to a few machines although they have the same source and
destination IP addresses.
It is the most "sticky" sharing configuration.
In other words, it increases the probability that a certain connection will pass through
a single cluster member on both inbound and outbound directions.
Clarification:
A connection will "stick" to a cluster member based only on IP addresses.
Example:
All connections from IP_1 (from any port) to IP_2 (to any port) will stick to the same
Member_A.
Warning:
Since all connections between the given IP addresses will stick to the same
member, the CPU load on that member might increase significantly, which in turn
will negate the whole purpose of Load Sharing cluster mode.
Note:
Sticky Decision Function is enabled automatically, if Mobile Access Software Blade is
enabled on the cluster.
For more details, refer to ClusterXL Administration Guide (R70, R70.1, R71, R75,
R75.20, R75.40, R75.40VS, R76, R77) - Chapter 'Sticky Connections'.
Refer to sk101539 (ClusterXL Load Sharing mode limitations and important notes).
Forwarding
Forwarding is a process of transferring of an incoming traffic from one cluster member
to another cluster member for processing
There are two methods of forwarding the incoming traffic:
Packet forwarding
o the packet is forwarded to the target member
o the packet will skip the inbound chain and get directly into the IP stack of
target machine
Chain forwarding
o the chain is forwarded to the target member
o the chain will start the chain process from the chain module that has asked
the chain forwarding (or the one after it)
Packet forwarding
Example:
A connection was initiated on the 'Standby' member in High Availability cluster. The
reply packets to such connection will be accepted by 'Active' member, and must be
forwarded to 'Standby' member.
Description:
The sending cluster member forwards the packet at the end of the Inbound processing.
On the target cluster member, the processing of the forwarded packet will continue from
the chain at which it has stopped on the source cluster member, or the packet will be
entered directly into the TCP/IP stack (if the packet has already passed through all
Inbound chains).
Debugging:
In order to see how a packet is forwarded between cluster members, debug the
'cluster' module with 'forward' flag (in addition, these flags are recommended:
'select', 'if', 'mac'):
[Expert@GW_HostName]# fw ctl debug -m cluster + forward select if mac
Technical details:
Packet Forwarding is performed in the following way (so that the target cluster member
can understand that this packet is intended to him):
o In order to see the arrival of forwarded packet, run the debug of 'cluster'
module with 'select' flag on Standby member:
FW-1: FORWARDED Packet : fwha_select_ip_packet: (IF if_name (if_number)
at N sec) using magic ether header (0xZZZZZZZZ)
Example:
IP address
192.168.204.1
192.168.204.10
10.10.10.10
192.168.204.12
10.10.10.12
In Load Sharing Multicast mode, the connection arrives to all cluster members,
and each member decides whether it should process the packet or not.
When the Sticky Decision Function (SDF) is used, refer to sk95150 (When the
Synchronization interfaces of three and more ClusterXL members are connected to
the same switch, port flapping occurs on the switch).
In order to see the arrival of forwarded packet, run the debug of 'cluster' module
with 'select' flag on receiving member:
o If the local member should process this packet, the following is printed:
FW-1: fwha_select_ip_packet: Packet IN SourceIP_in_Hex->DestIP_in_Hex
FW-1: fwha_local_member_should_procces_mc: local member should process
packet
FW-1: fwha_select_ip_packet: Packet was filtered by member Member_ID
o If the local member should not process this packet, the following is printed:
FW-1: fwha_select_ip_packet: Packet IN SourceIP_in_Hex->DestIP_in_Hex
FW-1: fwha_local_member_should_procces_mc: local member should not process
packet
FW-1: fwha_select_ip_packet: Packet was dropped by member Member_ID
In Load Sharing Unicast mode, the connection is forwarded over the same
interface, on which it was received - not over Synchronization Network.
o Layer 2 Source MAC address of the packet is inverted and combined in a special
way with values of these kernel parameters: fwha_mac_magic and
fwha_mac_forward_magic.
Notes:
fwha_mac_magic - controls the value of 5th byte in Source MAC address of
CCP packets (default values is 0xFE hex / 254 dec)
fwha_mac_forward_magic - controls the value of 5th byte in Source MAC
address of forwarded packets (default values is 0xFD hex / 253 dec)
Refer to sk25977 (Connecting multiple clusters to the same network segment
(same VLAN, same switch)
o Layer 2 Destination MAC address of the packet is changed to the MAC address
of the non-Pivot cluster member on the same subnet.
o Layer 3 Source IP address is the IP address of the host that sent the original
packet.
o Layer 3 Destination IP address is the physical IP address of the cluster member
on that subnet.
o The packet is dropped on the member that forwarded the packet (log is
generated only if forwarding fails).
Refer to sk41898 (Connecting multiple clusters running in Load Sharing Unicast
mode results in MAC Address flapping on switches).
Debug:
o In order to see the forwarding process, run the debug of 'cluster' module with
flags 'pivot' and flag 'select' on Pivot member:
FW-1: fwha_pivot_selection_from_packet: packet forwarded ok to machine
Target_Member_ID
fwhamultik_handle_ip_packet: Dropping packet since it is not my packet,
packet was forwarded (LS pivot)
o In order to see the arrival of forwarded packet, run the debug of 'cluster'
module with 'select' flag on non-Pivot member:
fwha_select_ip_packet: The inverted back source MAC address will be XXXX-XX-XX-XX-XX
Example:
IP address
192.168.204.1
192.168.204.10
192.168.204.12
Traffic flow: Pivot cluster member receives a TCP connection from Host and
forwards it to the non-Pivot cluster member.
Packet flow:
1. Pivot cluster member performs bit-wise 'NOT' on the 4 last octets (from the left)
of the Source MAC address of the packet.
Hence, in our example:
00:50:56:c0:00:01 becomes 00:50:A9:3F:FF:FE.
2. Pivot cluster member performs bit-wise 'AND' between:
o the value of fwha_mac_magic kernel parameter
o the value of fwha_mac_forward_magic kernel parameter
Let us take the default values:
o fwha_mac_magic=0xFE
o fwha_mac_forward_magic=0xFD
Hence, in our example:
[fwha_mac_magic AND fwha_mac_forward_magic] =
[(0xFE) AND (0xFD)] = 0xFC.
Chain forwarding
Example:
A connection was initiated that requires inspection by Check Point Active Streaming
(CPAS) - e.g., SMTP Security Server.
Description:
Chain forwarding enables one cluster member to pass a chain (a packet filtered by a
FireWall module, along with data attached to the packet by the different
handling routines) to another cluster member.
Thus, the second member can resume the handling process at the same point the first
member has ceased.
Starting in NGX R60, chain forwarding is also used for Dynamic Routing.
Debugging:
In order to see how a chain is forwarded between cluster members, debug the 'fw'
module with 'chainfwd' flag (in addition, these flags are recommended: chain',
'conn', 'packet'):
[Expert@GW_HostName]# fw ctl debug -m fw + chainfwd chain conn packet
Technical details:
In CPAS case, packet forwarding cannot be used because in order to use
packet forwarding, the chain must finish passing through all the chain modules. But
since all the information that CPAS holds on this connection is located only on the other
member, the chain cannot be processed by CPAS, and therefore should be forwarded
to the member that handled this connection originally.
CPAS information is not forwarded between members because of the size of
information that will need to be synchronized and will cause performance issues.
The Forwarding Layer will receive a packed chain on the source cluster member, and
will transmit it to the target cluster member. Any table updates, which are the result of a
transmitted chain, will be applied to the target member before the chain is delivered for
processing on that machine.
Packet Forwarding is performed in the following way (so that the target cluster member
can understand that this packet is intended to him):
In case, the target member is down, but its Sync interface is still up, the chain will be
forwarded to it and handled by it.
In case, the Sync interface is down, the chain will be dropped by the source
member.
Note:
Why not forwarding the packet and starting it at the beginning of the chain? Because in
that case, the original packet needs to be kept (before the changes that made to it by
the chain modules) and the entire table changes that were made need to be undone,
because they will be on the target member again. It appears that such implementation is
more complicated.
ClusterXL Configuration
Refer to this solution:
sk66527 (Recommended configuration for ClusterXL)
sk42096 (Cluster member is stuck in 'Ready' state)
Clock synchronization
In order to improve cluster stability, the clocks on all cluster members must be
synchronized. Although cluster members are able to deal with difference within 1 hour
(VPN has much stricter limit of several minutes), it is strongly recommended to use NTP on
cluster members.
Refer to these solutions:
sk25894 (Configuring NTP on SecurePlatform OS)
sk76600 (How to confirm NTP settings on SecurePlatform OS)
sk83820 (How to configure Advanced NTP features on Gaia OS)
sk92379 (How to configure NTP authentication on Gaia OS)
sk38957 (NTP FAQ for IP appliances)
sk41502 (How to adjust the polling interval in NTP on IP appliances)
sk62845 (How to enable or disable NTP on IP appliances)
sk62861 (How to verify that NTP is working on IP Appliances)
5.
6.
7.
8.
Notes:
Unused interfaces must be configured as 'Disconnected' (refer to Defining
Disconnected Interfaces section).
Alias IP addresses are not supported by ClusterXL. Refer to sk31821 (Traffic that
is sent to Secondary IP addresses / Alias IP addresses that were defined on
interfaces of ClusterXL members is not processed).
Configure identical number of CoreXL FW instances on cluster member machines
(using the 'cpconfig' command).
Configure SecureXL in identical way on cluster member machines (using the
'cpconfig' command and the 'sim affinity -s' command).
Connect the cluster member machines via the switches.
For the Synchronization interface(s), due to security reasons, a crossover cable or a
dedicated switch is recommended.
Proceed to the next section - configuration in SmartDashboard.
CCP mode
The ClusterXL Control Protocol (CCP) uses multicast by default, because it is more
efficient than broadcast.
If the connecting switch cannot forward multicast traffic, it is possible, though less
efficient, for the switch to use broadcast to forward traffic.
Refer to CCP modes section and to Requirements for switches and routers section.
Note: ClusterXL failover event detection is based on IPv4 probing (refer to the definition
of 'probing' and of 'pingable host' in Clustering Definitions and Terms section).
During state transition, the IPv4 driver instructs the IPv6 driver to reestablish IPv6
network connectivity to the cluster.
To enable IPv6 functionality for an interface, define an IPv6 address for the applicable
interface on the cluster and on each member. All interfaces configured with an IPv6
address must also have a corresponding IPv4 address. If an interface does not require
IPv6, only the IPv4 definition address is necessary.
Note: You must configure synchronization interfaces with an IPv4 address only. This is
because the synchronization mechanism works using IPv4 only. All IPv6 information
and states are synchronized using this interface.
In an IPv6 environment, the 'cphaprob -a if' command shows both the cluster
Virtual IPv4 addresses and cluster Virtual IPv6 addresses.
Refer to these solutions:
sk35178 (How to set up IPv6 in ClusterXL)
sk34552 (How to set up IPv6 on SecurePlatform)
sk39374 (IPv6 Support FAQ)
sk78220 ("fw ctl pstat" command shows "Sync: off" on cluster members when IPv6 is
enabled in R75.40 and above)
sk91905 (Configuring Proxy NDP for IPv6 Manual NAT)
sk92368 (ATRG: IPv6)
Procedure:
Interfaces that have IP addresses configured, can be defined as 'Disconnected' via
special configuration file / registry key (see below), or in SmartDashboard:
In SmartDashboard - open cluster object - go to 'Topology' pane - click on
'Edit...':
o If the unused interfaces are present, then set their Network Objective to 'NonMonitored Private' and install policy.
o If the unused interfaces do not appear in the topology yet, then click on 'Get...'
- select 'All Members' Interfaces...', then set their Network Objective to 'NonMonitored Private' and install policy.
SecureXL
Refer to Requirements for software section.
Refer to Performance Pack Administration Guide (R70, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77).
Refer to R70 Performance Optimization Guide.
Refer to these solutions:
sk25972 (About SecureXL Performance Pack)
sk32578 (SecureXL Mechanism)
sk98348 (Best Practices - Security Gateway Performance)
sk71200 (SecureXL NAT Templates)
sk67861 (Accelerated Drop Rules Feature in R75.40 and above)
sk66402 (SecureXL Drop Templates are not supported in versions lower than R76)
CoreXL
Refer to Requirements for software section.
Refer to Firewall Administration Guide (R70, R71, R75, R75.20, R75.40, R75.40VS Chapter 'CoreXL Administration'; R76, R77 - Chapter 'Maximizing Network Performance' CoreXL).
Refer to Performance Tuning Administration Guide (R76, R77) - Chapter 2 'CoreXL
Administration'
Refer to these solutions:
sk61701 (CoreXL Known Limitations)
sk98348 (Best Practices - Security Gateway Performance)
sk35990 (How Connections Table limit capacity behaves in CoreXL)
sk36151 (Maximum Concurrent Connections in CoreXL)
sk62620 (What is the fw_worker_X process?)
VPN
For more information, refer to ClusterXL Administration Guide (R70, R70.1, R71, R75,
R75.20, R75.40, R75.40VS, R76, R77) - Chapter 'ClusterXL Advanced Configuration' Working with VPNs and Clusters.
For 3rd party VPN products, refer to vendor's documentation.
NAT
Network Address Translation (NAT) is a fundamental aspect of the way ClusterXL
works.
When a packet leaves cluster member, the source IP address in the outgoing packet, is
the physical IP address of the cluster member interface.
The source IP address is changed using NAT to that of the Virtual IP address of the
cluster on that subnet.
This address translation is called "Cluster Hide".
The packet sent to the cluster Virtual IP address is accepted by one of the cluster
members. The destination IP address in the incoming packet is changed using NAT to that
of the physical IP address of the cluster member interface on that subnet.
This address translation is called "Cluster Fold".
For OPSec certified clustering products, this corresponds to the default setting (in
SmartDashboard) in the 3rd Party Configuration page of the cluster object, of Forward
Cluster's incoming traffic to Cluster Members' IP addresses being checked.
For more information, refer to ClusterXL Administration Guide (R70, R70.1, R71, R75,
R75.20, R75.40, R75.40VS, R76, R77) - Chapter 'ClusterXL Advanced Configuration' Working with NAT and Clusters.
For OPSec certified clustering products, refer to vendor's documentation.
Refer to these solutions:
sk31832 (How to prevent ClusterXL / VRRP / IPSO IP Clustering from hiding its own
traffic behind Virtual IP address)
sk32224 (NAT Table 'fwx_alloc')
sk30197 (Configuring Proxy ARP for Manual NAT)
VLAN
When defining VLAN tags on an interface, cluster IP addresses can be defined only on
the VLAN interfaces (the tagged interfaces).
Defining a cluster IP address on a physical interface that has VLANs is not supported.
This physical interface has to be defined with the Network Objective Monitored Private.
Note: Refer to CCP and VLAN interfaces section.
For more information, refer to ClusterXL Administration Guide (R70, R70.1, R71, R75,
R75.20, R75.40, R75.40VS, R76, R77) - Chapter 'ClusterXL Advanced Configuration' Working with VLANs and Clusters.
In addition, refer to the Release Notes of the given version.
Refer to these solutions:
sk92826 (ClusterXL VLAN monitoring)
sk61323 (Monitoring of VLAN interfaces in ClusterXL)
sk92784 (Configuring VLAN Monitoring on ClusterXL for specific VLAN interface)
Introduction
When dealing with mission-critical applications, an enterprise requires its network to be
highly available.
Clustering provides redundancy at the gateway level.
However, without Link Aggregation - redundancy of Network Interface Cards (NICs), or
redundancy of the switches on either side of the gateway are only possible in a cluster, and
only by failover of the gateway to another cluster member.
Configuration
Refer to these User Guides:
How to Configure ClusterXL for L2 Link Aggregation on SecurePlatform and Gaia
OS
How to Configure Link Aggregation Groups on IPSO OS
Start with these ClusterXL Administration Guides (because the Link Aggregation
support was added for the first time in these versions):
R65 ClusterXL Administration Guide - Chapter 'ClusterXL Advanced Configuration' Working with Link Aggregation and Clusters - Configuring Interface Bonds
R70 ClusterXL Administration Guide - Chapter 'ClusterXL Advanced Configuration' Working with Link Aggregation and Clusters - Configuring Interface Bonds
R70.1 ClusterXL Administration Guide - Chapter 'ClusterXL Advanced Configuration'
- Link Aggregation and Clusters
In addition, refer to ClusterXL Administration Guide (R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'ClusterXL Advanced Configuration' - Link Aggregation and
Clusters.
Important Note: It is mandatory to define the physical slave interfaces (that will
comprise the bond interface) as 'Disconnected'. Refer to Defining Disconnected Interfaces
section.
Link Aggregation can be configured in one of these two modes:
High Availability (Active/Backup) mode (supported since R65) - only one interface
at a time is active.
Upon interface failure, the bond fails over to another interface.
Different slave interfaces of the bond can be connected to different switches, to
benefit from high availability of switches in addition to high availability of interfaces
(refer to Fully Meshed Redundancy via Interface Bonding section above).
Load Sharing (Active/Active) mode (supported since R70.1 / VSX R67) - all
interfaces are active, for different connections.
Connections are balanced between interfaces according to Layer 3 and Layer 4, and
follow either the IEEE 802.3ad standard, or XOR.
Load Sharing mode has the advantage of increasing throughput, but requires
connecting all the interfaces of the bond to one switch (which must support LACP).
For both Link Aggregation High Availability mode and for Link Aggregation Load
Sharing mode:
The number of bond interfaces that can be defined is limited by the maximal number
of interfaces supported by each platform (refer to Release Notes of each given
version).
Up to 8 slave NICs can be configured in a single High Availability bond or Load
Sharing bond.
In the case of switch or Security Gateway failure, a High Availability cluster solution
provides system redundancy.
Figure below depicts a redundant system without Link Aggregation (two synchronized
Security Gateways - cluster members) deployed in a simple redundant topology:
In this scenario:
GW-1 and GW-2 are cluster members
S-1 and S-2 are switches
C-1 and C-2 are interconnecting
networks
Cluster members GW-1 and GW-2 each have
one external NIC connected to an external
switch (S-1 and S-2, respectively).
In the event of a failure of either Active cluster
member GW-1, its NIC (on C-X), or switch S1, cluster member GW-2 becomes the only
Active gateway, connecting to switch S-2 over
C-2.
In any of the 3 cases (gateway failure, NIC failure or switch failure), the result of the
failover is that no further redundancy exists, and a further failure of any active component
will completely stop network traffic.
Link Aggregation provides high availability of NICs. If one fails, the other can function in
its place. This functionality is in Bond High Availability mode and in Bond Load Sharing
mode.
Fully Meshed Redundancy via Interface Bonding
The Link Aggregation High Availability mode, when deployed with ClusterXL, enables a
higher level of reliability by providing granular redundancy in the network. This granular
redundancy is achieved by using a fully meshed topology, which provides for independent
backups for both NICs and switches.
A fully meshed topology further enhances the redundancy in the system by providing a
backup to both the interface and the switch, essentially backing up the cable. Each cluster
member has two external interfaces, one connected to each switch.
Figure below depicts this implementation, where both cluster members are connected to
both external switches:
In this scenario:
GW-1 and GW-2 are Security Gateway
cluster members in New High
Availability
mode
S-1 and S-2 are switches
C-1, C-2, C-3 and C-4 are networks
After a switch failure, switch functionality and
gateway high availability are maintained.
Similarly, after a NIC failure, switch and
gateway high availability are maintained.
Note: The bond failover operation requires network interface cards that support the
Media-Independent Interface (MII) standard.
Failover can occur because of a failure in the physical link state, or a failure in the
receiving/sending of CCP packets. Either of these failures will trigger a failover: either
within the bond interface, or between cluster members (depending on the circumstances)
2. The second step relates to the configuration of the cluster topology. Here, the
cluster IP addresses are determined, and associated with the interfaces of the
cluster members (each member must have an interface responding to each cluster
IP address). Normally, cluster IP addresses are associated with an interface based
on a common subnet. In this case, these subnets are not the same. It must be
explicitly specified, which member subnet is associated with the cluster IP address.
Example:
Proxy ARP
Refer to this solution:
sk30197 (Configuring Proxy ARP for Manual NAT).
Let us consider the following scenario:
1. Two networks (Network_A and Network_B) are separated by a Security Gateway
(single Security Gateway or ClusterXL).
2. On each network, there is a host (Host_A on Network_A,
and Host_B on Network_B).
3. Let us assume, that Network_A represents the Internal network,
and Network_B represents the External network.
4. According to the existing standards, when Host_B needs to send data to Host_A,
an ARP Request for the MAC address of Host_A will be sent
by Host_B to Network_B.
Since Host_A is located on another network, and the Security Gateway acts as a
router, this ARP Request (sent to Broadcast address on Layer2) will not be
forwarded by the Security Gateway from Network_B to Network_A.
As a result, Host_B will not discover the MAC address of Host_A, and will not be
able to send the data to Host_A.
A standard solution, in such cases, is to configure the Security Gateway to act
as Proxy ARP.
The Security Gateway will pretend to be the Host in question. The Security Gateway
will accept the ARP Requests and the Security Gateway will send its own MAC
Address in ARP Reply. Then, when the data is received from the External
network, the Security Gateway will forward the data to the relevant host on the
Internal network.
Configuration on the Security Gateway is two-fold:
1. Layer2-to-Layer3 matching - matching IP addresses of the relevant hosts on the
Internal network to the MAC Address of the Security Gateway on the External
network (performed via special configuration file
$FWDIR/conf/local.arp
).
2. NAT rules
ISP Redundancy
If you have a ClusterXL Gateway cluster, connect each cluster member to each ISP
using two physical interfaces.
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'ClusterXL Advanced Configuration' - Configuring ISP
Redundancy on a Cluster.
Note: ISP Redundancy is not supported in ClusterXL where physical interfaces of
cluster members and cluster VIP are defined on different subnets. Refer to sk66521 (ISP
Redundancy in ClusterXL when interfaces of cluster members and cluster VIP are defined
on different subnets per sk32073).
Refer to FireWall Administration Guide (R70, R71, R75, R75.20, R75.40, R75.40VS) Chapter 'ISP Redundancy'.
Refer to Security Gateway Technical Administration Guide (R76, R77) - Chapter 3 'ISP
Redundancy'.
Refer to these User Guides:
How To Configure ISP Redundancy
How To Configure ISP Redundancy in SecurePlatform
Refer to these solutions:
sk25129 (Supported platforms for ISP Redundancy)
sk42636 (Controlling connections configured with ISP Redundancy in Load Sharing
mode)
sk66521 (ISP Redundancy in ClusterXL when interfaces of cluster members and
cluster VIP are defined on different subnets per sk32073)
sk23630 (Advanced configuration options for ISP Redundancy)
sk32225 (Configuring ISP Redundancy so that certain traffic uses specific ISP)
sk40958 (How to verify the status of ISP Redundancy links on command line)
Dynamic Routing
ClusterXL supports Dynamic Routing (Unicast and Multicast) protocols as an integral
part of Check Point operating systems. As the network infrastructure views the clustered
gateways as a single logical entity, failure of a cluster member will be transparent to the
network infrastructure and will not result in a ripple effect.
When configuring the routing protocols on each cluster member, each member is
defined identically, and uses the cluster VIP addresses (not the members' physical IP
addresses). Meaning, that Router ID should be set to cluster Virtual IP on each member.
Note: When configuring OSPF restart, you must define the restart type as signaled or
graceful. For Cisco devices, use type signaled.
Note: If cluster running on SecurePlatform OS does not participate in Dynamic Routing
protocols, then disable Advanced Dynamic Routing on each cluster member in order to
prevent unexpected cluster failovers due to FIB pnote.
SNMP
Refer to Requirements for software section.
Refer to these solutions:
sk90860 (How to configure SNMP on Gaia OS)
sk79280 (How to add SNMP user defined settings in Gaia)
sk92999 (How to create custom SNMP traps in Gaia)
sk34511 (How to enable SNMP on SecurePlatform OS)
sk68560 (How to configure SNMP on SecurePlatform OS)
sk65923 (How to configure the cluster to send SNMP Trap upon fail-over)
sk93455 (Send SNMP Trap in the event of a ClusterXL failover to multiple Trap
Servers)
sk40266 (SNMP, MIBs, and how SNMP traps work)
sk71980 (Output of a 'snmpwalk' command with 'exec' extension or 'extend'
extension is limited)
sk78360 (How to Extend SNMP)
sk65173 (Check Point SNMP sysObjectID .1.3.6.1.2.1.1.2)
sk40622 (SNMPv3 USM (User-based Security Model) User)
sk42426 (Hardware Monitoring with SNMP on Power-1 / UTM-1 / Smart-1 / 2012
appliances)
When all Critical Devices (Pnotes) report their states as 'ok', the machine will try to
change its state to 'Active', depending on the cluster configuration (HA mode / LS
mode) and states of the peer members.
Among several properly functioning cluster members working in HA mode, the
machine will become an 'Active' depending on the configuration:
o In 'Active Up' configuration ('Maintain current active Cluster Member') - a first
cluster member (on a time base), which reaches the 'Ready' state, will become
'Active'.
o In 'Primary Up' configuration ('Switch to higher priority Cluster Member') machine with highest priority will become 'Active'.
When on all cluster members, some Critical Device report their state as 'problem',
one of the member will become 'Active' and will get into derived state 'Active
attention', symbolizing that it has a failure. The choice regarding what machine will
become an 'Active' is a random and does not depend on the machines priorities /
numbers and type of Critical Devices that report their state as 'problem'.
Policy installation
When the policy is installed onto a cluster member, the fwd daemon calls the
"cphastart" command in order to start the clustering mechanism.
The "cphastart" command is responsible to read the $FWDIR/conf/objects.C file
in order to get all required information from the cluster object, and cluster members'
objects.
Once done, the "cphastart" command calls the "cphaconf" command with all the
relevant parameters.
The "cphaconf" command performs 2 main actions:
Moves the configuration parameters to the Check Point kernel (in the kernel, the
parameters are not enforced right away - instead, the new configuration parameters
are buffered, and a process called "policy negotiation" starts)
Notifies the cphastart daemon about the new loaded policy
The "cphaconf" command sends a signal to the cphamcset daemon to reload the
information from the objects. If the cphamcset daemon is not yet started, it will be started.
The cphamcset daemon is responsible for opening sockets on the NICs in order to
allow them to pass multicast traffic (CCP) to the machine (run 'ip maddr show'
command).
Check Point kernel has a mechanism, which ensures that all cluster members enforce
the same security policy and the same ClusterXL parameters at any given time.
Since the policy installation does not take place simultaneously on all cluster members
(actually the policy commit is sequential), there may be some time difference between the
installations on all the members.
In order to overcome this problem, the policy negotiation is divided into two phases:
All members must acknowledge the new policy arrival.
Then, all members must acknowledge moving into the new policy.
During Phase I, a machine that got new policy sends CCP packets declaring that it got a
new policy with a certain Policy ID.
The following line appears in cluster debug with 'conf' flag:
CPHA: Phase I: Looking for machines in policy update mode...
The other machines also send this CCP packet as soon as they get the new policy. All
the machines wait to receive the confirmation packet from all the other machines, signalling
that the new policy arrived to all cluster members.
Now, Phase II takes place, when the CPHA timer is stopped completely in order to
avoid sending packets with the old parameters, and the new policy parameters are
enforced.
The following line appears in cluster debug with 'conf' flag:
CPHA: Phase II: Looking for machines ready to update policy...
After having done that, each machine sends another packet indicating it completed the
policy change phase.
When all the machines completed the policy change p
hase, the HA timer is started and
all the machines are updated with the new configuration.
Each one of these steps is backed up with a HA timer, which reverts the process, if not
all the Active machines confirmed the new stage after a certain time (refer to
'fwha_policy_update_timeout_factor' kernel parameter).
In this case, the old parameters are restored.
Debugging:
In order to see the policy installation, debug the 'cluster' module with 'conf' flag (in
addition, these flags are recommended: 'stat', 'pnote', 'if', 'mac'):
[Expert@GW_HostName]# fw ctl debug -m cluster + conf stat pnote if mac
Example from R76 High Availability (Active Up) cluster - from Active member:
; 2Jul2013 13:51:53.832490;[cpu_0];[fw4_0];FW-1: SIM (SecureXL Implementation Module) SecureXL device detected.;
; 2Jul2013 13:51:53.958580;[cpu_2];[fw4_0];FW-1: SecureXL: Connection templates are not possible for the installed policy. Please
refer to the Performance Pack documentation for further details.;
; 2Jul2013 13:51:56.060111;[cpu_2];[fw4_1];FW-1: fwha_set_conf: entered with State=ACTIVE, Blocking State=ACTIVE;
; 2Jul2013 13:51:56.060115;[cpu_2];[fw4_1];FW-1: fwha_set_conf: need_to_set_trusted_ifs=0, need_to_delete_trusted_ifs=0;
; 2Jul2013 13:51:56.060116;[cpu_2];[fw4_1];FW-1: fwha_set_conf: confinfo->op: FWHAC_DEL_TRUSTED_IFS;
; 2Jul2013 13:51:56.060260;[cpu_2];[fw4_1];FW-1: fwha_set_conf: setting HA configuration (op = 0x80):;
; 2Jul2013 13:51:56.060273;[cpu_2];[fw4_1];FW-1: fwha_set_conf: SWITCH SUPPORT ;
; 2Jul2013 13:51:56.060275;[cpu_2];[fw4_1];FW-1: fwha_set_conf: Deleting all Trusted IFs;
; 2Jul2013 13:51:56.060277;[cpu_2];[fw4_1];FW-1: fwha_set_conf: buffering deletion of trusted interfaces;
; 2Jul2013 13:51:56.060279;[cpu_2];[fw4_1];FW-1: fwha_set_conf: setting need_to_delete_trusted_ifs=1 and returning;
; 2Jul2013 13:51:56.175754;[cpu_2];[fw4_1];FW-1: fwha_set_conf: entered with State=ACTIVE, Blocking State=ACTIVE;
; 2Jul2013 13:51:56.175754;[cpu_2];[fw4_1];FW-1: fwha_set_conf: need_to_set_trusted_ifs=0, need_to_delete_trusted_ifs=1;
; 2Jul2013 13:51:56.175754;[cpu_2];[fw4_1];FW-1: fwha_set_conf: confinfo->op: FWHAC_ADD_TRUSTED_IF;
; 2Jul2013 13:51:56.176228;[cpu_2];[fw4_1];FW-1: fwha_set_conf: setting HA configuration (op = 0x100):;
; 2Jul2013 13:51:56.176230;[cpu_2];[fw4_1];FW-1: fwha_set_conf: Trusted IF name = eth1;
; 2Jul2013 13:51:56.176232;[cpu_2];[fw4_1];FW-1: fwha_set_conf: SWITCH SUPPORT ;
; 2Jul2013 13:51:56.176234;[cpu_2];[fw4_1];FW-1: fwha_set_conf: Adding Trusted IF;
; 2Jul2013 13:51:56.176237;[cpu_2];[fw4_1];FW-1: fwha_set_conf: buffering trusted interface info (setting need_to_set_trusted_ifs=1);
; 2Jul2013 13:51:56.176240;[cpu_2];[fw4_1];FW-1: fwha_set_conf: copying confinfo->if_name=eth1 to fwha_trusted_ifs_buffered[0] and
returning;
.........................................
; 2Jul2013 13:51:56.255176;[cpu_3];[fw4_1];CPHA: the list of cluster IPs according to the interface:;
; 2Jul2013 13:51:56.255179;[cpu_3];[fw4_1];Interface: 1) eth0, cluster ip: 172.30.41.79;
; 2Jul2013 13:51:56.255180;[cpu_3];[fw4_1];Interface: 3) eth2, cluster ip: 20.20.20.79;
.........................................
; 2Jul2013 13:52:00.110480;[cpu_2];[fw4_1];CPHA: policy update packet local=NO, random=47647, status=2, policy=1862759333, first=YES,
entry=0;
; 2Jul2013 13:52:00.110489;[cpu_2];[fw4_1];Entry: 0
random_id: 47647
policy_id: 1862759333
update status: 2
time: 2013864;
; 2Jul2013 13:52:01.373818;[cpu_3];[fw4_1];FW-1: fwha_set_conf: entered with State=ACTIVE, Blocking State=ACTIVE;
; 2Jul2013 13:52:01.373820;[cpu_3];[fw4_1];FW-1: fwha_set_conf: need_to_set_trusted_ifs=1, need_to_delete_trusted_ifs=1;
; 2Jul2013 13:52:01.373822;[cpu_3];[fw4_1];FW-1: fwha_set_conf: confinfo->op: FWHAC_START;
; 2Jul2013 13:52:01.373826;[cpu_3];[fw4_1];FW-1: fwha_state_freeze: turning freeze type 0 ON (time=2013876, caller=fwha_set_conf);
; 2Jul2013 13:52:01.373828;[cpu_3];[fw4_1];FW-1: fwha_state_freeze: FREEZING state machine at ACTIVE (time=2013876,
caller=fwha_set_conf, freeze_type=0);
; 2Jul2013 13:52:01.373830;[cpu_3];[fw4_1];FW-1: fwha_set_conf: setting HA configuration (op = 0x30407e):;
; 2Jul2013 13:52:01.373831;[cpu_3];[fw4_1];FW-1: fwha_set_conf: mode = 4 (active up);
; 2Jul2013 13:52:01.373833;[cpu_3];[fw4_1];FW-1: fwha_set_conf: cluster ID = 4916;
; 2Jul2013 13:52:01.373834;[cpu_3];[fw4_1];FW-1: fwha_set_conf: cluster size = 2;
P. 71
P. 72
; 2Jul2013 13:52:01.375854;[cpu_2];[fw4_1];Entry: 1
random_id: 56134
policy_id: 1862759333
update status: 3
time: 2013876;
; 2Jul2013 13:52:01.375856;[cpu_2];[fw4_1];CPHA: Phase II: Looking for machines ready to update policy...found 2 machines.;
; 2Jul2013 13:52:01.375858;[cpu_2];[fw4_1];CPHA: All machines are ready to change their configuration.;
; 2Jul2013 13:52:01.375891;[cpu_2];[fw4_1];FW-1: Stopping ClusterXL.;
; 2Jul2013 13:52:01.375924;[cpu_2];[fw4_0];FW-1: stopping HA timer;
; 2Jul2013 13:52:01.375929;[cpu_2];[fw4_1];FW-1: stopping HA timer;
; 2Jul2013 13:52:01.375933;[cpu_2];[fw4_2];FW-1: stopping HA timer;
; 2Jul2013 13:52:01.376515;[cpu_2];[fw4_1];FW-1: fwha_bond_set_configuration: entering ...;
; 2Jul2013 13:52:01.376565;[cpu_2];[fw4_1];FW-1: fwha_conf_mode: fwha_installed=1, fwha_mode=4, mode=4, pivot_mode=0;
; 2Jul2013 13:52:01.376571;[cpu_2];[fw4_1];FW-1: Changing the machine ID to 0;
; 2Jul2013 13:52:01.376575;[cpu_2];[fw4_1];FW-1: set_use_sdf: Setting sdf mode to 0;
; 2Jul2013 13:52:01.376587;[cpu_2];[fw4_1];FW-1: fwha_reset_trusted_ifs: resetting required if number;
; 2Jul2013 13:52:01.376591;[cpu_2];[fw4_1];FW-1: add_trusted_if: added interface eth1 in position 0 in list;
; 2Jul2013 13:52:01.376596;[cpu_2];[fw4_1];fwha_set_vmac_state: fwha_vmac_global_param_enabled=0, ha_new_config.cluster_vmac_mode =
0, fwha_pivot_mode = 0, FWHA_USE_BACKUP_MODE() = 1, enable_vmac=0;
; 2Jul2013 13:52:01.376598;[cpu_2];[fw4_1];fwha_set_vmac_state: vmac mode should be disabled;
; 2Jul2013 13:52:01.376600;[cpu_2];[fw4_1];fwha_set_vmac_state: vmac state was not changed=0;
; 2Jul2013 13:52:01.385388;[cpu_2];[fw4_0];fwha_set_sync_tcp_handshake_mode: mode=MINIMAL. Disabling TCP handshake enforcement;
; 2Jul2013 13:52:01.385395;[cpu_2];[fw4_1];fwha_set_sync_tcp_handshake_mode: mode=MINIMAL. Disabling TCP handshake enforcement;
; 2Jul2013 13:52:01.385407;[cpu_2];[fw4_2];fwha_set_sync_tcp_handshake_mode: mode=MINIMAL. Disabling TCP handshake enforcement;
; 2Jul2013 13:52:01.385452;[cpu_2];[fw4_0];FW-1: starting HA timer;
; 2Jul2013 13:52:01.385454;[cpu_2];[fw4_1];FW-1: starting HA timer;
; 2Jul2013 13:52:01.385455;[cpu_2];[fw4_2];FW-1: starting HA timer;
; 2Jul2013 13:52:01.385464;[cpu_2];[fw4_1];fwha_df_set_force_df_ips_only_mode: is_nac_enabled = 0;
; 2Jul2013 13:52:01.385469;[cpu_2];[fw4_1];fwha_df_set_force_df_ips_only_mode: multi_portal_enabled = 0;
; 2Jul2013 13:52:01.385470;[cpu_2];[fw4_1];fwha_df_set_force_df_ips_only_mode: old force df ips only mode: 0, new force df ips only
mode: 0;
; 2Jul2013 13:52:01.385484;[cpu_2];[fw4_1];FW-1: Starting ClusterXL.;
; 2Jul2013 13:52:01.385500;[cpu_2];[fw4_1];FW-1: fwha_state_freeze: turning freeze type 0 OFF (time=2013876, caller=policy change finished changes (fwha_start));
; 2Jul2013 13:52:01.385503;[cpu_2];[fw4_1];FW-1: fwha_state_freeze: ENABLING state machine at ACTIVE (time=2013876,caller=policy
change - finished changes (fwha_start));
P. 73
In ClusterXL:
Initialization built-in Devices
report OK
Initializing
Ready
Periodic check
of Devices OK
HA mode, and
other machine
is Active
Interface
Active Check
reports OK
Down
LS mode, or
no other active
machines
heard
Standby
Active
All nonproblematic
machines
confirmed the
Active state
There are no
members that
send lower
version of CCP
Interface
Active Check
reports problem
Other
Critical Device
reports problem
No other Active
machines
in the cluster
Down
Active
State Sync
failure
High Availability
configuration
Down
Active
Down
Active
Active/Standby
Down
Critical Devices
OK
State Synchronization - cluster members exchange Delta Sync packets about the
processed connections to keep the relevant kernel tables synchronized on all cluster
members.
Note: Each Delta Packet contains many pieces of information about different
connections. The payload of these Delta Sync packets is not encrypted, but it is not
human-readable (i.e., sniffing this traffic will not allow anyone to understand the
contents of these packets). The only way to understand what was transferred in
these packets is to run the relevant cluster debug on all cluster members (fw ctl
debug -m fw + sync).
It is up to the cluster administrator to make sure the Sync network is secured and
isolated.
Health checks - cluster members exchange reports and query each other about
their own states and the states of their cluster interfaces:
o Health-status Reports
o Cluster-member Probing
o State-change Commands
o Querying for Cluster Membership
Notes:
o These CCP packets are not encrypted.
o This applies only to ClusterXL - Check Point cluster running on Gaia OS /
SecurePlatform OS / Crossbeam COS / Windows OS / Solaris OS.
o In 3rd Party clusters (e.g., Check Point cluster running on Crossbeam XOS /
IPSO OS), the 3rd Party software is responsible for health checks.
Explanations:
o Health-status Reports - These reports contain the state of the transmitting cluster
member, as well as the presumed state of the other cluster members.
o Cluster-member Probing - If a cluster member fails to receive status for another
member (does not receive CCP packets from that member) on a given segment,
cluster member will probe that segment in a best-effort attempt to illicit a
response.
The purpose of such probes is to detect (best-effort) the nature of possible
interface failures, and to determine which module has the problem.
The outcome of this probe will determine what action is taken next (change
the state of an interface, or of a cluster member).
Cluster member sends a CCP packet 'FWHA_IF_PROB_REQ'.
Cluster member sends series of ARP Requests in the loop for all IP
addresses on this subnet.
If hosts on this subnet send ARP Replies to cluster member, then cluster
member sends series of ICMP Requests (one such host is enough).
If hosts on this subnet send ICMP Replies to cluster member (one such host
is enough), then the local interface on this member is considered to work
correctly, and the missing CCP packets from peer member are considered as a
failure on peer member.
As a result, the peer member might be declared as failed ('Down'), which in
turn might cause a fail-over in the cluster.
Example:
Cluster member FW1 is not able to send/receive CCP packets to/from the
other member FW2 on the interface eth1, this member FW1 will need to
determine where the problem occurs - on the local interface eth1 or on the
other member - and perform a fail-over (if needed)
There are 2 possible reasons why this member FW1 will not able to
send/received CCP packets from the other member FW2:
o Cluster mechanism on the other member FW2 does not work anymore
- nobody can send CCP packets to this member FW1 and receive CCP
packets from this member FW1.
o Local interface eth1 on this member FW1 does not work anymore there is not traffic at all.
Computer administrator (human) can always determine where the problem
is - check cables, send pings, etc.
Cluster member is not that smart and has to rely on some simple tests that
are called "Probing".
When a member starts probing, cluster member starts sending ARP
Requests for the IP addresses in the subnet.
If there are hosts with such IP addresses on the subnet, they will send an
ARP Reply to the cluster member (one such host is enough).
Cluster member starts sending ICMP Requests to the IP addresses that
answered the ARP Requests.
If the hosts send an ICMP Reply to the cluster member (one such host is
enough), then the cluster member FW1 will know that it can send usual traffic
through this interface eth1 and the problem with CCP packets must be
happening on the other member FW2.
If this cluster member FW1 is not able to determine where the problem is,
this interface eth1 will be declared as Failed (and by design, a fail-over will
occur).
o State-change Commands - If a cluster member needs to change its state, the
command to do so takes place on the defined secured (sync) interface.
o Querying for Cluster Membership - When a cluster member comes online, it will
send a series of CCP query/response messages, to gain knowledge of cluster
membership (which members are located on these subnets).
Sync timer
Purpose:
Performs sync-related actions every fixed interval. By default, the sync timer interval is
100ms. The base time unit is 100ms (or 1 tick), which is therefore the minimal value.
This time interval is controlled via global kernel parameter.
Global kernel parameter:
fwha_timer_sync_res
Formula in the code:
Sync timer interval =
= 10 x fwha_timer_base_res x fwha_timer_sync_res =
= 10 x 10 ms x fwha_timer_sync_res
Parameter values:
Integers from 1 (default) to 232-1
Notes:
o Increasing this value increases the time interval between Delta Sync actions. For
example, if the timer is doubled to 200 ms (fwha_timer_sync_res=2), then the
time interval between Delta Sync actions also doubles to 200 ms.
o Refer to sk41471 (ClusterXL - State Synchronization time interval and
'fwha_timer_sync_res' kernel parameter).
CPHA timer
Purpose:
Performs cluster-related actions every fixed interval. By default, the CPHA timer interval
is 100ms. The base time unit is 100 ms (or 1 tick), which is also the minimum value.
This time interval is controlled via global kernel parameter.
Global kernel parameter:
fwha_timer_cpha_res
CCP modes
CCP can run in these modes:
Multicast (default since NG FP3 HF2) - the Layer 2 Destination MAC address of
CCP packets is 01:00:5E:X:X:X
Broadcast - the Layer 2 Destination MAC address of CCP packets is
FF:FF:FF:FF:FF:FF
Unicast - the Layer 2 Destination MAC address of CCP packets is the physical MAC
address of specific cluster member(s). This mode is used:
o On VSX cluster in VSLS configuration - when number of configured Virtual
Systems is less than the number of cluster members
o On 41000/61000 appliance (starting in R75.40VS for 61000) - refer to
'asg_sync_manager' utility (61000 Security System Administration Guide)
In VSX cluster:
VSX NGX / VSX NGX R65 / VSX NGX R67 / VSX NGX R68:
o The only possible mode of CCP is Broadcast.
R75.40VS / R76 and above:
o CCP mode over Sync Network is Broadcast for all Virtual Systems.
o CCP mode over non-Sync Networks is Multicast.
In VSLS configuration, when instances of Virtual Systems are not running on all
cluster members (e.g., only 2 VSs were configured on a VSX cluster that has 4
cluster members), the Delta Sync packets generated by a Virtual System, are sent in
Unicast only to those members that run the instance of same the Virtual System.
Refer to sk36644 (The Mode of Cluster Control Protocol (CCP) in VSX cluster).
Note: The CCP mode is not set on Virtual Switches because they do not send CCP
packets.
It is possible to change the CCP mode on-the-fly. Refer to sk20576 (How to set
ClusterXL Control Protocol (CCP) in Broadcast / Multicast mode in ClusterXL):
Notes:
This change must be done on all members of the cluster.
This change is applied immediately.
This change survives reboot:
o Unix OS: refer to $FW_BOOT_DIR/ha_boot.conf file
o Windows OS: refer to Windows Registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CPHA\C
CP_mode
Procedure:
Notes:
o The CCP mode will appear at the end of the line.
o In VSX R68 and lower, the mode is not displayed (only Broadcast is supported).
Example from ClusterXL:
Required interfaces: 4
Required secured interfaces: 1
eth0
eth1
eth2
eth3
UP
UP
UP
UP
VLAN interfaces
ClusterXL
all cluster interfaces are
monitored
only lowest VLAN tag is
monitored
only lowest and highest
VLAN tags are monitored
(since R75.47/R77)
VSX cluster
all cluster interfaces
are monitored
HA: only lowest and highest
VLAN tags are monitored
VSLS: all VLAN tags
are monitored
It is possible to customize the default monitoring of VLAN tags in the following way:
Monitor VLAN tag
Only lowest VLAN tag
Only lowest and highest
VLAN tag
All VLAN tags
Only specific VLAN tag
(since R71)
ClusterXL
default
VSX cluster
need to disable
the default behaviour *
default
default (HA)
(since R75.47/R77)
not supported
default (VSLS)
Refer to sk92784 (Configuring VLAN Monitoring on
ClusterXL for specific VLAN interface)
* Note: In VSX cluster, in order to disable the default monitoring behaviour, set the value of
the relevant kernel parameter to 0 (zero):
Pre-R75.40VS versions: fwha_monitor_all_vlans
R75.40VS / R76 and above: fwha_monitor_all_vlan
Refer to sk35462 (Abnormal behavior of cluster members during failover when 'Monitor all
VLAN' feature is enabled).
External Header
0
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
48
64
10
11
TTL
IP
Pro
to
(11)
Total
Length
12
13
Eth Type
(0x0800)
14
IP
Ver
(4)
IP
header
checksum
Layer 3
Source
IP address
UDP
checksum
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
Important Note: It is not possible to control the CCP packets by security policy rule
base (neither by security rules, nor by NAT rules) because CCP is located between the
Check Point kernel and the network interface.
Length of external headers:
o Ethernet Header = 14 bytes
o IP Header = 20 bytes
o UDP Header = 8 bytes
o CCP offset = 42 bytes from frame start
Example:
If VIP = 192.50.204.20, then Final MAC =
01:00:5E:("50"hex):("204"hex):("20"hex)
01:00:5E:32:CC:14
Example:
If VIP = 192.168.204.20, then Final MAC =
01:00:5E:("168-128"hex):("204"hex):("20"hex) =
01:00:5E:28:CC:14
01:00:5e:YY:ZZ:WW
Algorithm:
1. Calculate the interface's network address - perform
logical AND between the interface's IP address
and subnet mask
2. Add 250 to the calculated interface's network
address
3. Convert the 2nd (YY), 3rd (ZZ) and 4th (WW)
octets of the final calculated IP address from Dec
to Hex format
Example #1
A. The interface's IP address and subnet mask are:
192.168.40.100 / 24
Example #2
A. The interface's IP address and subnet mask are:
192.168.40.100 / 29
FF:FF:FF:FF:FF:FF
In VSX cluster:
Value
Notes
FF:FF:FF:FF:FF:FF Refer to sk36644.
o In VSX NGX / VSX R65 / VSX R67 / VSX R68:
The only possible mode of CCP is Broadcast.
o In R75.40VS / R76 and above in VSX mode:
o CCP mode over Sync Network is Broadcast for all
Virtual Systems
o CCP mode over non-Sync Networks is Multicast
o In VSLS configuration:
When instances of VSs are not running on all cluster
members (e.g., only 2 VSs were configured on a VSX
cluster that has 4 cluster members), the Delta Sync
packets generated by a VS, are sent in Unicast only to
those members that run the instance of same the VS.
o Source MAC address (Bytes 6 - 11)
Note: The same Source MAC address is used for all the VSs on the same member.
In ClusterXL (on Gaia R77.30 and above) and in VSX mode (R77.30 and above)
before installing the policy for the first time:
1st
00
2nd
00
3rd
00
4th
00
5th
Value derived from
Cluster_Global_ID
6th
21
Notes:
Cluster_Global_ID - controls the value of 5th byte in Source MAC address of
CCP packets.
Default values are:
o 0xFE hex / 254 dec - ClusterXL Gateway mode on Gaia OS R77.30 and above
o 0xFE hex / 254 dec - ClusterXL VSX mode on Gaia OS R77.30 and above
In ClusterXL (R77.20 and lower) and in VSX mode (R75.40VS / R76 / R77 / R77.10 /
R77.20) before installing the policy for the first time:
1st
00
2nd
00
3rd
00
4th
00
5th
fwha_mac_magic
6th
21
Notes:
fwha_mac_magic - name of the kernel parameter that controls the value of 5th
byte in Source MAC address of CCP packets.
Default values are:
o 0xFE hex / 254 dec - ClusterXL Gateway mode on R77.20 and lower
o 0xFE hex / 254 dec - ClusterXL VSX mode on R75.40VS / R76 and above
o 0xF6 hex / 246 dec - VSX Cluster from VSX NGX up to VSX R68
Refer to these solutions:
sk62432 (Source MAC Address of Cluster Control Protocol (CCP) frames in
ClusterXL before installing the policy for the first time)
sk25977 (Connecting multiple clusters to the same network segment (same
VLAN, same switch)
2nd
3rd
4th
00
00
00
00
5th
Value derived from
Cluster_Global_ID
6th
ID_of_Source_Member
2nd
3rd
4th
00
00
00
00
5th
Value of
fwha_mac_magic
6th
ID_of_Source_Member
2nd
3rd
4th
00
00
XXXXXXXX
00
5th
Value derived from
Cluster_Global_ID
6th
ID_of_Source_Member
2nd
3rd
4th
00
00
00
00
5th
Value of
fwha_mac_magic
6th
ID_of_Source_Member
Notes:
Cluster_Global_ID - controls the value of 5th byte in Source MAC address of
CCP packets.
Default values are:
o 0xFE hex / 254 dec - ClusterXL Gateway mode on Gaia OS R77.30 and above
o 0xFE hex / 254 dec - ClusterXL VSX mode on Gaia OS R77.30 and above=
fwha_mac_magic - controls the value of 5th byte in Source MAC address of
CCP packets.
Default values are:
o 0xFE hex / 254 dec - ClusterXL Gateway mode on R77.20 and lower
o 0xFE hex / 254 dec - ClusterXL VSX mode on R75.40VS / R76 and above
o 0xF6 hex / 246 dec - VSX Cluster from VSX NGX up to VSX R68
XXXXXXXX - is either 00000000, or 8 least significant (right-most) bits of VSID
Refer to this solution:
sk25977 (Connecting multiple clusters to the same network segment (same
VLAN, same switch)
Layer 3
Address
Value
0.0.0.0
Source IP address
Destination IP address broadcast
address for
this subnet
Notes
The IP address of the CCP packet on
the receiver side is ignored and is not
being checked.
Layer 4 (UDP)
Port
Source port
Destination port
Value
8116
8116
Notes
It is strongly recommended not to pass
any other traffic on UDP port 8116
through ClusterXL
CCP Header
0
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
64
CoreXL
instance
ID
VSX
VSID
10
11
TTL
IP
Pro
to
(11)
IP
header
checksum
12
13
Eth Type
(0x0800)
14
IP
Ver
(4)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
Policy ID
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Filler
Total
num. of
CoreXL
FW inst.
CCP Data
The CCP version on 32-bit system in Gateway mode equals the value of kernel
parameter fwha_version:
CCP 32-bit GW = fwha_version
The CCP version on 64-bit system in Gateway mode is greater by 1 than CCP
version on 32-bit system in Gateway mode:
CCP 64-bit GW = CCP 32-bit GW + 1 = fwha_version + 1
The CCP version on system in VSX mode is greater by 2 than CCP version on 32-bit
system in Gateway mode:
CCP VSX = CCP 32-bit GW + 2 = fwha_version + 2
Version
Code (Hex)
0x0001
0x0002
0x0003
0x0006
0x0212
0x0216
0x0219
0x0219
0x021C
0x021D
0x0226
0x0227
0x0228
0x0229
0x024F
0x0251
0x0259
602
646
650
665
667
690
691
700
705
710
800
801
802
803
804
0x025A
0x0286
0x028A
0x0299
0x029B
0x02B2
0x02B3
0x02BC
0x02C1
0x02C6
0x0320
0x0321
0x0322
0x0323
0x0324
805
810
811
813
0x0325
0x032A
0x032B
0x032D
Check Point
software version
4.1
NG (FP0)
NG FP1
NG FP2
NG FP3
VSX NG AI R2
VSX NGX EA
VSX NGX GA
NG AI R54 EA
NG AI R54 GA
NG AI R55 (up to HFA_16)
NG AI R55 HFA_17
NG AI R55W
NG AI R55 HFA_18
NG AI R55 LSV
NGX R60 EA
NGX R60 GA
NGX R60 HFA_01
NGX R60 HFA_02
NGX R60 Multicast acceleration
NGX R60 with Anti-Virus
NGX R61 EA2
NGX R61 GA
NGX R62 EA
NGX R62 GA
Connectra NGX R61 EA
Connectra NGX R61 GA
Connectra NGX R66 GA
NGX R65 EA
NGX R65 GA
NGX R65 HFA_01
NGX R65 HFA_02
NGX R65 HFA_02
Connectra NGX R66.1
NGX R65 HFA_03
NGX R65 HFA_03 GA
NGX R65 HFA_40
NGX R65 HFA_50
814
815
816
850
900
901
902
1000
1010
1001
1100
1500
1501
1502
0x032E
0x032F
0x0330
0x0352
0x0384
0x0385
0x0386
0x03E8
0x03F2
0x03E9
0x044C
0x05DC
0x05DD
0x05DE
1505
1506
1508
1516
1518
1520
0x05E1
0x05E2
0x05E4
0x05EC
0x05EE
0x05F0
1523
1505
1555
0x05F3
0x05E1
0x0613
1557
1559
1561
1562
1563
2000
2000
0x0615
0x0617
0x0619
0x061A
0x061B
0x07D0
0x07D0
2005
2010
2020
2210
2211
2500
2501
2502
2500
2501
2502
2220
2221
2225
2226
0x07D5
0x07DA
0x07E4
0x08A2
0x08A3
0x09C4
0x09C5
0x09C6
0x09C4
0x09C5
0x09C6
0x08AC
0x08AD
0x08B1
0x08B2
2230
2231
2235
2236
2700
2701
2702
2720
2721
2722
2900
2901
2902
2905
2906
2907
2910
2911
2912
2920
2921
2922
62700
62701
62702
62710
62711
62712
62700
62701
62702
62700
62701
62702
62700
62701
62702
0x08B6
0x08B7
0x08BB
0x08BC
0x0A8C
0x0A8D
0x0A8E
0x0AF0
0x0AA1
0x0AA2
0x0B54
0x0B55
0x0B56
0x0B59
0x0B5A
0x0B5B
0x0B5E
0x0B5F
0x0B60
0x0B68
0x0B69
0x0B6A
0xF4EC
0xF4ED
0xF4EE
0xF4F6
0xF4F7
0xF4F8
0xF4EC
0xF4ED
0xF4EE
0xF4EC
0xF4ED
0xF4EE
0xF4EC
0xF4ED
0xF4EE
R75.47 32-bit
R75.47 64-bit
R75.48 32-bit
R75.48 64-bit
R76 32-bit
R76 64-bit
R76 in VSX mode
R76.10 32-bit
R76.10 64-bit
R76.10 in VSX mode
R77 32-bit
R77 64-bit
R77 in VSX mode
R77.10 32-bit
R77.10 64-bit
R77.10 in VSX mode
R77.20 32-bit
R77.20 64-bit
R77.20 in VSX mode
R77.30 32-bit
R77.30 64-bit
R77.30 in VSX mode
R76SP for 41000/61000 32-bit
R76SP for 41000/61000 64-bit
R76SP for 41000/61000 in VSX mode
R76SP.10 for 41000/61000 32-bit
R76SP.10 for 41000/61000 64-bit
R76SP.10 for 41000/61000 in VSX mode
R76SP.10_VSLS for 41000/61000 32-bit
R76SP.10_VSLS for 41000/61000 64-bit
R76SP.10_VSLS for 41000/61000 in VSX mode
R76SP.20 for 41000/61000 32-bit
R76SP.20 for 41000/61000 64-bit
R76SP.20 for 41000/61000 in VSX mode
R76SP.30 for 41000/61000 32-bit
R76SP.30 for 41000/61000 64-bit
R76SP.30 for 41000/61000 in VSX mode
Cluster Number (Bytes 46 - 47) - This number identifies the cluster, on which this
datagram is communicated. The cluster number is set by Security Management Server.
CCP OpCode (Bytes 48 - 49) - This code identifies the type of CCP packet. Each CCP
OpCode implies a different structure of thepacketsData section (see below).
Refer to this document (the structure of CCP Data has not changed):
o NGX R60 Advanced Technical Reference Guide (ATRG) - Chapter 11 ClusterXL Debugging CPHA Issues - General Analysis Matrix for CPHA Packets
OpCode
1
2
3
4
5
6
Type
FWHA_MY_STATE
FWHA_QUERY_STATE
FWHA_IF_PROBE_REQ
FWHA_IF_PROBE_REPLY
FWHAP_IFCONF_REQ
FWHAP_IFCONF_RPLY
FWHAP_LB_CONF
FWHAP_LB_CONF_CONFIRM
9
10
FWHAP_POLICY_CHANGE
FWHAP_SYNC
11
FWHAP_CHASSIS_STATE
12
FWHAP_CHASSIS_FREEZE
13
FWHAP_SECURITY_GROUP
14
FWHAP_CHASSIS_SYNC_LOST
15
FWHAP_CHASSIS_LINK_STATE
16
FWHAP_CHASSIS_GENERAL_INFO
Description
Report source machine's state
Query other machine's state
Interface active check (probe) request
Interface active check (probe) reply
Interface configuration request
Interface configuration reply
Load Balancing (Load Sharing)
configuration report
Load Balancing (Load Sharing)
configuration report and a request for
its confirmation (a reply to
FWHAP_LB_CONF)
Policy ID change request/notification
Delta Sync packets ("New" version)
Only on 41000/61000 appliance:
Chassis protocol
Only on 41000/61000 appliance:
Chassis freeze mechanism
(freeze after failover)
Only on 41000/61000 appliance:
Security group advertising
Only on 41000/61000 appliance:
Chassis sync lost mechanism
(freeze when sync is lost)
Only on 41000/61000 appliance:
Chassis link state mechanism
(freeze when sync is lost)
Only on 41000/61000 appliance:
Additional Chassis info
Source IF Number (Bytes 50 - 51) - The ID of the network interface that originated this
CCP packet.
These IDs are assigned by Check Point kernel during attachment to the interfaces.
Refer to the output of the 'fw ctl iflist' command on each cluster member (Note:
these outputs show the local configuration on the cluster member, and therefore do not
have to be identical on all cluster members).
Random ID (Bytes 52 - 53) - Each cluster member is assigned a random ID upon boot.
This field states the random ID of the machine that originated this CCP packet.
Source Machine ID (Bytes 54 - 55) - The ID of the machine that originated the packet
based on the internal cluster numbering (starts from zero). Each cluster member is given a
number, which identifies it within the cluster - refer to the output of 'cphaprob state'
command.
These numbers are assigned based on the priority of cluster members as configured in
SmartDashboard - cluster object - 'ClusterXL Members' pane (the higher the member is
located in this list, the higher its priority and the lower its ID).
Destination Machine ID (Bytes 56 - 57) - The ID of the machine, for which this CCP
packet is intended based on the internal cluster numbering (starts from zero). Each cluster
member is given a number, which identifies it within the cluster - refer to the output of
'cphaprob state' command.
These numbers are assigned based on the priority of cluster members as configured in
SmartDashboard - cluster object - 'ClusterXL Members' pane (the higher the member is
located in this list, the higher its priority and the lower its ID).
Policy ID (Bytes 58 - 59) - Each policy installed on cluster member is identified by a unique
ID. This enables different cluster members to verify they are working under the same
policy. Policy ID can be seen only during cluster debug (fw ctl debug -m cluster +
conf).
Note: To handle a situation, where one member has already enforced the new policy ID,
and sends Delta Sync packets to member, who has not yet done so, we regard packets
that contain the previous policy ID as legal, for a short period after the end of the policy
negotiations.
Filler (Bytes 60 - 61) - Originally, this field was used to align the CCP header, and it was
always set to 0.
As of NG FP3, this field is also used to indicate the status of the source machine in
Service Mode only. Possible values for this field are 1 for 'Active' and 0 for 'Down'.
Starting in NG FP4, the Filler has 2 fields in Service Mode:
The first byte (nibble) contains the member status (as in NG FP3):
o If it contains 1, then in 'Sync only' mode, the member is ready to accept a Full
Sync from other cluster members. Otherwise, it can not act as a Full Sync server.
This can happen if Full Sync has failed, or if there is no policy yet.
The second byte (nibble) contains the pnote status.
o If it contains 0, then all pnotes report their status as 'OK'.
o Otherwise, it will contain 1.
Note: The 'Filler' field is relevant only in cluster running on IPSO OS, in which a member
state is updated also by the statuses of pnotes. In other 3rd party solutions, the pnote
status is passed on the network, but is being disregarded by Check Point code.
Total num. of CoreXL FW inst. (Bytes 62 - 63) - Total number of loaded CoreXL FW
instances. This field exists since R70.
CoreXL instance ID (Bytes 64 - 65) - The ID of the CoreXL FW instance, to which this
CCP packet belongs (sent from/to). This field exists since R70.
VSX VSID (Bytes 66 - 67) - ID of the Virtual System, to which this CCP packet belongs
(sent from/to). In non-VSX, always contains 0.
This field exists in R75.40VS, R76, R77 and above.
CCP Data (Bytes 68 - above) - Each CCP OpCode implies a different structure of the
packetsData section.
FWHA_MY_STATE Data
OpCode
1
Type
FWHA_MY_STATE
Description
Report source machine's state
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
64
CoreXL
instance
ID
VSX
VSID
Number
of
Reported
IDs
Report
Code
IFR
In
IF
As.
In
IF
Out
IF
As.
Out
IF
LPT
of
ID 0
LPT
of
ID 1
10
11
TTL
...
IP
header
checksum
HA Mode
12
13
Eth Type
(0x0800)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Filler
Total
num. of
CoreXL
FW inst.
Policy ID
Problem
14
IP
Ver
(4)
State
State
of
of
ID 0
ID 1
...
...
...
The byte IFR (Interface Report) is calculated using the following formula:
IFR = 70 + <Number of Reported IDs>
Number of Reported IDs (Bytes 68 - 69) - Specifies the number of machines, for which
the state is reported.
Report Code (Bytes 70 - 71) - Flags indicating whether this packet contains a machine
state report, an interface state report, or both. The flags specified below can be combined
together using bitwise OR to form the field value:
Value
0x1
0x2
Flag Name
Description
FWHAP_RP_MACHINE_STATE Report source machine's state
FWHAP_RP_IF_STATE
Queryanothermachinesstate
HA Mode (Bytes 72 - 73) - Contains the mode of the machine that sent this datagram.
Value
0
1
2
3
4
Mode Name
FWHA_UNDEF_MODE
FWHA_NOT_ACTIVE_MODE
FWHA_BALANCE_MODE
FWHA_PRIMARY_UP_MODE
FWHA_ONE_UP_MODE
Description
Report source machine's state
HA is not active
More than one machine is active
Backup mode: active machine is the one
with the lowest ID alive
Backup mode: active machine remains
active until it dies
FWHA_FW_STANDBY
FWHA_FW_READY
4
10
FWHA_FW_ACTIVE
FWHA_FW_TOTAL_DEAD
Description
Machine reports itself as dead
Machine is up and running, but is not
ready to receive packets yet
Machine is able to process packets, but
is currently set as a backup machine
Machine is ready to process packets,
but is currently waiting for other
machines to confirm their states
Machine is filtering packets
Timeout occurred waiting for this
machine to report (more than 1 sec)
In IF (Byte IFR) - Number of interfaces currently up, in the inbound direction on the source
machine.
As. In IF (Byte IFR+1) - Number of interfaces currently assumed to be up, in the inbound
direction on the source Machine. (Bytes 70+2x - 70+2x+1)
Out IF (Byte IFR+2) - Number of interfaces currently up, in the outbound direction on the
source machine.
As. Out IF (Byte IFR+3) - Number of interfaces currently assumed to be up, in the
outbound direction on the source machine.
LPT of ID x (Byte IFR+4+x) - Reports the time, in HA time units (10 HA time units ~ 1
second), elapsed since the last CCP packet was received from machine with ID x.
Note: HA time units are mostly used by Check Point RnD.
FWHA_QUERY_STATE Data
OpCode
2
Type
FWHA_QUERY_STATE
Description
Query other machine's state
These packets are used by a cluster member to ask another member for its status. This
is used when source member stopped receiving CCP packets from another member for
some time (0.2 seconds) and may want to inquire the other member to see if it is "alive".
This CCP packet does not have any CCP Data.
Note: These packets are not sent in 3rd party clusters.
FWHAP_IF_PROBE_REQ Data
OpCode
3
Type
FWHA_IF_PROBE_REQ
Description
Interface active check request
An interface probing is a mechanism, which allows a machine to verify that its interfaces
are up and are able to receive and transmit data.
These packets are used to verify the status of each interface.
This is done to detect connectivity problems of the interfaces.
Refer to Clustering Definitions and Terms section.
Note: These packets are not sent in 3rd party clusters.
0
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
64
CoreXL
instance
ID
VSX
VSID
Interface
Number
10
11
TTL
IP
Pro
to
(11)
IP
header
checksum
12
13
Eth Type
(0x0800)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
Policy ID
14
IP
Ver
(4)
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Filler
Total
num. of
CoreXL
FW inst.
Interface Number (Bytes 62 - 63) - FireWall-1 serial interface number of the queried
interface. Refer to the output of 'fw ctl iflist' command.
FWHAP_IF_PROBE_RPLY Data
OpCode
4
Type
FWHA_IF_PROBE_REPLY
Description
Interface active check reply
An interface probing is a mechanism, which allows a machine to verify that its interfaces
are up and are able to receive and transmit data.
This packet is a reply to FWHAP_IF_PROBE_REQ packet.
These packets are used to verify the status of each interface.
This is done to detect connectivity problems of the interfaces.
Note: The transmit state of an interface (as monitored by 'Interface Active Check' pnote)
is refreshed once a FWHAP_IF_PROBE_RPLY packet is received in acknowledge to
FWHA_IF_PROBE_REQ packet.
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
64
CoreXL
instance
ID
VSX
VSID
Interface
Number
10
11
TTL
IP
Pro
to
(11)
IP
header
checksum
12
13
Eth Type
(0x0800)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
Policy ID
14
IP
Ver
(4)
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Filler
Total
num. of
CoreXL
FW inst.
Interface Number (Bytes 62 - 63) - FireWall-1 serial interface number of the queried
interface. Refer to the output of 'fw ctl iflist' command.
FWHAP_IFCONF_REQ Data
OpCode
5
Type
FWHAP_IFCONF_REQ
Explanation
Interface configuration request
These packets are used in order to learn the following information about peer
cluster members:
o Interfaces
o IP addresses
o MAC addresses
These packets are sent occasionally to verify the IP addresses are still the same.
ClusterXL uses these packets in order to discover cluster misconfiguration as follows:
o whether one machine considers an interface secured, while the other does not
o whether the IP addresses reported by the sending machine belong to a different
interface on the receiving machine (which may indicate a cable connectivity
problems).
This CCP packet does not have any CCP Data.
Note: The 'FWHA_IFCONF_REQ' packet is always sent with Layer 2 Destination MAC
address of subnet Broadcast FF:FF:FF:FF:FF:FF. Refer to sk44410 (CCP packets are
sent in Broadcast although CCP mode is set to Multicast).
Note: These packets are sent in 3rd party clusters.
FWHAP_IFCONF_RPLY Data
OpCode
6
Type
FWHAP_IFCONF_RPLY
Explanation
Interface configuration reply
10
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
CoreXL
VSX
instance
VSID
ID
IP addr 1
64
80
11
TTL
IP
Pro
to
(11)
Number of
Reported
IPs
IP addr 2
IP
header
checksum
12
13
14
IP
Ver
(4)
Eth Type
(0x0800)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
Policy ID
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Total
num. of
CoreXL
FW inst.
Trusted
interface
?
Filler
Ethernet Address
IP addr 3
15
IP
Hdr
Len
...
Number of Reported IPs (Bytes 68 - 69) - Number of IP addresses associated with this
interface.
Ethernet Address (Bytes 70 - 75) - The real Ethernet address of the interface (as opposed
tothephonyaddress,seeExternal Header).
Trusted interface? (Bytes 76 - 77) - Boolean value: 1 if this interface is trusted (secured),
0 otherwise.
IP addr X (Bytes 78+4x - 78+4x+3) - IP address number X associated with the reporting
interface (ClusterXL uses only the first configured IP address).
FWHAP_POLICY_CHANGE Data
OpCode
9
Type
FWHAP_POLICY_CHANGE
Explanation
Policy ID change request/notification
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
64
CoreXL
instance
ID
VSX
VSID
10
11
12
TTL
Policy
Update
State
IP
Pro
to
(11)
IP
header
checksum
13
Eth Type
(0x0800)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
Policy ID
14
IP
Ver
(4)
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Filler
Total
num. of
CoreXL
FW inst.
New Policy ID
Policy Update State (Bytes 68 - 71) - The members that originated this packet, notifies the
other members whether or not it needs to change its own configuration, due to the new
policy. The message is also used to notify all cluster members that the originator is ready to
apply the changes.
Possible values are:
Value
Name
Description
FWHA_POLICY_UPD_INIT
1
This member does not need
to update its configuration
FWHA_POLICY_UPD_NEED
2
This member needs to update
its configuration to conform
with the new policy
FWHA_POLICY_UPD_READY
3
This member is ready to apply
the configuration changes
FWHA_POLICY_UPD_NEW
4
This member has just joined the cluster,
and has already applied the new policy
New Policy ID (Bytes 72 - 75) - Specifies the ID of the new policy, which the source
member is trying to enforce. All cluster members should agree on this value before the
policy can be updated.
This field contains last two bytes of MD4 hash of Policy ID (Policy ID is generated by the
Security Management Server based on the contents of compiled policy files <PolicyName>.ft, <PolicyName>.fc, <PolicyName>.set).
FWHAP_SYNC Data
OpCode
10
Type
FWHAP_SYNC
Explanation
New Delta Sync packets
This packet type defines a sub-protocol of CCP, used to maintain the State
Synchronization between cluster members. This is done by sending updates about the
FireWall kernel tables wrapped in the CCP packet data.
Refer to State Synchronization in ClusterXL section.
16
Total
Length
IP
datagram
ID
IP Flags +
Fragment
Offset
32
Layer 3
Destin.
IP addr
(cont.)
Layer 4
Source
Port
Layer 4
Destin.
Port
Total
Length
UDP
checksum
48
CCP
OpCode
Source IF
Number
Random
ID
Source
Machine
ID
Destin.
Machine
ID
64
CoreXL
instance
ID
VSX
VSID
10
11
Sequence
Number
Sync OP
Flags
TTL
IP
Pro
to
(11)
IP
header
checksum
12
13
Eth Type
(0x0800)
14
IP
Ver
(4)
Layer 3
Source
IP address
Magic
Number
(0x1A90)
Policy ID
15
IP
Hdr
Len
Layer 3
Destin.
IP addr
CCP
Version
Cluster
Number
Filler
Total
num. of
CoreXL
FW inst.
OpCode name
BC_MSG
BC_RETRANS_REQ
BC_RETRANS_REQ
BC_RETRANS_REJECT
Explanation
Holds FireWall table data
(may be fragmented)
Request to retransmit
missing data fragments
Request an ACK message
from peer members
Rejects a retransmission request
Flags (Byte 71, Upper Nibble) - A bit-wise combination of the following values:
Flag Value
0x80
0x10
0x20
Flag Name
BC_ACK_FLAG
Explanation
Indicates an acknowledge is
required for flushed data
BC_FRAGM_FLAG
Indicates this packet is a single
fragment of a larger message
BC_LAST_FRAGM_FLAG Indicates this is the last
fragment in the message
SmartView Tracker
The best and simplest way to start cluster troubleshooting, is to check the cluster logs
(pre-requisite for such logs is to set 'Track changes in the status of cluster
members' to 'Log' in SmartDashboard - cluster object - ClusterXL - Tracking).
Refer to Configuring cluster object in SmartDashboard section.
In SmartView Tracker:
1. Open the FireWall log that contains the data from the time of the cluster problem.
2. Go to the 'Date' column
A. Right-click on the 'Date' column header
B. Click on 'Edit Filter...'
C. Select the relevant date
D. Click on OK button
3. Go to the 'Time' column
A. Right-click on the 'Time' column header
B. Click on 'Edit Filter...'
C. Select the relevant time
D. Click on OK button
4. Go to the 'Information' column
A. Right-click on the 'Information' column header
B. Click on 'Edit Filter...'
C. Select 'Specific'
D. In 'Field' - select 'Contains'
E. In 'Text' - type cluster_info
F. Click on OK button
5. Analyze the cluster logs
6. Go to 'File' menu - click on 'Export...'
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'Monitoring and Troubleshooting Gateway Clusters' Monitoring Cluster Status Using SmartConsole Clients - SmartView Tracker.
Refer to SmartView Tracker Administration Guide (R75.40, R75.40VS, R76, R77).
SmartView Monitor
SmartView Monitor displays a snapshot of all ClusterXL cluster members, enabling real
time monitoring and alerting. For each cluster member, state change and critical device
problem notifications are displayed.
The SmartView Monitor GUI client communicates with the cluster member via the
Check Point Application Monitoring (AMON) Infrastructure.
The AMON client (SmartView Monitor GUI) sends a request for some specific OID
(SNMP Get) to the AMON server on the cluster member. The AMON server queries the
Check Point kernel (in the same way as the "cphaprob" commands) in order to retrieve the
requested information.
The information is then formatted into MIB (SNMP Response) and sent back to the
AMON client for display.
It is also possible to stop and start ClusterXL on the member:
1. On the left, go to Gateways Status view.
2. Select the relevant cluster member of a given cluster.
3. Right-click on the selected member.
4. Go to Cluster Member menu
5. Select the relevant operation - 'Stop Member' or 'Start Member'.
Notes:
SmartView Monitor uses a separate Check Point infrastructure to control ClusterXL
(special internal command is sent from SmartView Monitor to Security Management
Server that manages this cluster, which sends another internal command to perform
the requested operation on ClusterXL).
Complicated debug is required in order to see this communication (FWM and CPD
daemons on Management Server, and CPD daemon on cluster member).
Cluster administrator should use command line on each cluster member to control
ClusterXL (cpstart/cpstop ; cphastart/cphastop).
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'Monitoring and Troubleshooting Gateway Clusters' Monitoring Cluster Status Using SmartConsole Clients - SmartView Monitor.
Refer to SmartView Monitor Administration Guide (R70, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'Monitoring Gateway Status' - Configuring Gateway Views
- Start/Stop Cluster Member.
Refer to these solutions:
sk67560 (How to export History Report from SmartView Monitor)
sk65923 (How to configure the cluster to send SNMP Trap upon fail-over)
sk31961 (When viewing a ClusterXL Member via SmartView Monitor, VLAN
Interfaces not visible)
sk88360 ('Error: 'ClusterXL' is not responding. Verify that 'ClusterXL' is installed on
the gateway' message in SmartView Monitor)
sk53701 (ClusterXL works correctly in HA mode, but in LS mode a member is shown
as 'Disconnected' in SmartView Monitor, and policy installation intermittently fails on
that member with SIC error no. 148)
Clock synchronization
Refer to Clock Synchronization section.
If clocks on cluster members are out of sync, then the SIC communication between the
members and the VPN will fail.
Refer to these solutions:
sk92602 (How to troubleshoot NTP on Gaia OS)
sk90365 (Enabling NTP causes OSPF adjacencies to disconnect)
sk92984 (NTP client on Gaia fails to synchronize with Windows 2003)
sk40322 (Is it recommended to use NTP with VRRP or IP Clustering?)
sk39783 (NTP process fails after there is a VRRP state change)
sk67740 (How to stop 'ntpdate[PID]: adjust time server' logs in /var/log/messages)
sk32647 (Entries in /var/log/messages files have different timestamps when using
NTP Server - some entries are shown with local time, and some entries are shown
with correct UTC/GMT time)
CCP mode
Refer to Cluster Control Protocol (CCP) section and to ClusterXL Requirements for
Hardware and Software section.
SecureXL
Refer to Requirements for software section and to SecureXL section.
Refer to these solutions:
sk32578 (SecureXL Mechanism)
sk98722 (ATRG: SecureXL)
sk71200 (SecureXL NAT Templates)
sk67861 (Accelerated Drop Rules Feature in R75.40 and above)
sk66402 (SecureXL Drop Templates are not supported in versions lower than R76)
sk79620 (SecureXL 'sim affinity -s' settings do not survive reboot)
sk61962 (SMP IRQ Affinity on Check Point Security Gateway)
sk62441 (Problems with VPN and NAT when SecureXL is enabled)
sk93308 (Security Gateway randomly reboots when IPS or SecureXL is enabled)
sk82280 (Security Gateway with Route Based VPN configuration crashes when
SecureXL is enabled)
sk90301 (SecureXL does not start on the Backup member of VRRP cluster after
reboot)
sk79880 (Traffic is dropped 'by cphwd_offload_conn Reason: VPN and/or NAT
traffic between accelerated and non-accelerated interfaces or between nonaccelerated interfaces is not allowed')
sk93348 (On R75.40VS in VSX mode, traffic does not pass from Virtual Router to
Virtual System when SecureXL is enabled)
CoreXL
Refer to Requirements for software section and to CoreXL section.
Refer to these solutions:
sk61701 (CoreXL Known Limitations)
sk98737 (ATRG: CoreXL)
sk42096 (Cluster member is stuck in 'Ready' state)
sk44488 (CoreXL is enabled, however not all available CPU cores are used)
sk36750 ("License violation: The current machine has M CPU cores and the
installed license is valid for up to N CPU cores" error when installing license)
sk61284 (CoreXL Affinity settings of daemons do not survive reboot)
sk64301 (CoreXL interface affinity is not enforced, even if SecureXL is disabled)
sk76800 (IP Pool NAT support in CoreXL)
sk53060 (URI Resource and CoreXL)
sk86401 (Connections with Hide NAT are dropped during policy installation due to
NAT port allocation failure when CoreXL is enabled)
sk65463 ('Peak' number of connections - discrepancy between the output of 'fw tab t connections -s' command and the output of 'fw ctl pstat' command when CoreXL is
enabled)
sk83300 (Packets are dropped on Trusted Interface MPLS when CoreXL is enabled)
sk43443 (How to debug CoreXL)
sk80940 (Multi-Queue hotfix for Security Gateway)
VPN
Refer to VPN section.
Refer to these solutions:
sk92332 (Customizing the VPN configuration for Check Point Security Gateway 'vpn_table.def' file)
sk108600 (VPN Site-to-Site with 3rd party)
sk62441 (Problems with VPN and NAT when SecureXL is enabled)
sk93204 (Troubleshooting "Clear text packet should be encrypted" error in
ClusterXL)
sk61902 (How to start VPND daemon under debug)
skI4326 (Enabling IKE and VPN debugging)
sk33327 (How to generate a valid ike debug, vpn debug and fw monitor)
NAT
Refer to NAT section.
The following command allows to work with the NAT table (fwx_alloc, ID 8187):
[Expert@Member]# fw tab -t fwx_alloc [flags]
For more information on the 'fw tab' command, refer to Command Line Interface
Reference Guide - Chapter 'Security Management Server and Firewall Commands' - fw - fw
tab.
VLAN
Refer to VLAN section and to CCP and VLAN interfaces section.
Refer to these solutions:
sk92826 (ClusterXL VLAN monitoring)
sk61323 (Monitoring of VLAN interfaces in ClusterXL)
sk92784 (Configuring VLAN Monitoring on ClusterXL for specific VLAN interface)
sk35462 (Abnormal behavior of cluster members during failover when 'Monitor all
VLAN' feature is enabled)
sk95218 (Disconnected monitored VLAN can cause ClusterXL upgrade failure)
Pay attention to the link status on physical slave interfaces and to the bond
parameters, compare these to the configuration on the switch(es).
ISP Redundancy
Refer to section ISP Redundancy section.
Refer to these solutions:
sk42636 (Controlling connections configured with ISP Redundancy in Load Sharing
mode)
sk66521 (ISP Redundancy in ClusterXL when interfaces of cluster members and
cluster VIP are defined on different subnets per sk32073)
sk25152 (Static (Hide) NAT fails for outgoing connections through gateway with ISP
Redundancy in Load Sharing mode)
sk60590 (ISP Redundancy is missing from the gateway or cluster object)
sk61692 (Troubleshooting ISP Redundancy)
sk65341 (ISP Redundancy probing is not working in ClusterXL)
sk83900 (ISP Redundancy failover is not working in Gaia OS)
sk31530 (ISP Redundancy Link Interface cannot be created)
sk40958 (How to verify the status of ISP Redundancy links on command line)
Dynamic Routing
Refer to Dynamic Routing section.
Refer to these solutions:
sk62570 (How to troubleshoot failovers in ClusterXL - Advanced Guide)
sk31243 (ClusterXL member is "Down" due to Critical device "FIB")
sk43281 (FIBMGR packets dropped by fw_cluster_ttl_anti_spoofing Reason: ttl
check drop)
sk43243 (How to debug FIBMGRD daemon)
sk41393 (How to Troubleshoot OSPF Problems)
sk40164 (What Information do I collect for OSPF issues?)
sk33201 (Regarding ClusterXL and OSPF)
sk36231 (OSPF equal multipath support in SecurePlatform Pro)
sk82600 (Graceful restart for OSPF and BGP in Gaia does not work)
sk32568 (How to increase OSPF adjacency membership on SecurePlatform Pro)
sk84520 (How to debug OSPF and RouteD daemon on Gaia)
sk60860 (How to debug OSPF and GateD daemon on SecurePlatform Pro)
sk60861 (How to debug BGP and GateD daemon on SecurePlatform Pro)
sk92598 (How to collect traces and debugs information for PIM and Multicast on
Gaia)
sk85280 (Advanced Routing (OSPF, BGP, etc) configuration is not saved by 'save
configuration <file name>' command in Gaia CLISH shell)
SNMP
Refer to SNMP section.
Refer to these solutions:
sk59023 (Disable verbose SNMP logging - "snmpd[PID]: Received SNMP packet(s)
from UDP:")
sk66648 (SecurePlatform does not send SNMP Traps)
sk66581 (SecurePlatform sends SNMP Traps only to one sink server, although
several sink servers were configured; SNMP Traps are always sent with with 'public'
community name)
sk93644 (How to bind SNMPD on SecurePlatform OS to specific interface)
sk80820 (LinkUp/LinkDown (linkUpLinkDown) Trap is not working on Gaia)
sk72760 ('snmpwalk' always reports speed of Bond and Bridge interfaces as 10
Mbps)
sk77260 ('snmpwalk' always reports speed of 10 Gb interfaces as 10 Mbps)
sk90362 (SNMPD daemon fails to start on Gaia OS)
sk89300 (SNMPD daemon crashes after interface IP address change on Gaia OS)
sk61425 (Machine with Check Point software responds with 'No Such Object
available on this agent at this OID' to Check Point SNMP OID, but responds
correctly to generic SNMP OID)
sk69625 (Gaia does not provide SNMP RAID Trap)
sk66585 (/var/log/messages shows - snmpd[PID]: /etc/snmp/snmpd.conf: line N:
Warning: Unknown token)
sk92937 (SNMPv3 with USM 'authentication' configuration does not survive reboot
on Gaia OS)
sk93204 (Troubleshooting "Clear text packet should be encrypted" error in
ClusterXL)
sk38936 (How to debug dropped SNMP V1 & V2 packets)
sk56783 (How to debug SNMPD daemon on SecurePlatform and Gaia)
sk66586 (How to debug SNMPMONITOR on SecurePlatform and Gaia)
sk66383 (How to debug CPSNMPAGENTX on SecurePlatform and Gaia)
sk66384 (How to debug CPSNMPD on SecurePlatform and Gaia)
Policy Installation
Policy installation on cluster triggers re-configuration of each cluster member. Part of
this re-configuration is negotiation of the state of each member.
The policy installation process is transparent for the traffic. Policy installation, in certain
cases, may cause a cluster member to initiate a failover.
Cluster administrator can control the installation of policy on cluster with the help of
several kernel parameters (each parameter is described below):
fwha_freeze_state_machine_timeout
fwha_policy_update_timeout_factor
fwha_conf_immediate
fwha_cul_policy_freeze_timeout_millisec
fwha_cul_policy_freeze_event_timeout_millisec
The following messages in /var/log/messages file are normal during the boot of
the machine:
;FW-1: fwha_state_freeze: FREEZING state machine at FAILURE
(time=HTU,caller=fwha_set_conf);
;FW-1: fwha_state_freeze: ENABLING state machine at FAILURE
(time=HTU,caller=policy change - finished changes (fwha_start));
Full Sync
Refer to State Synchronization in ClusterXL section.
Refer to these solutions:
sk37029 (Forcing Full Synchronization in ClusterXL)
sk37030 (Debugging Full Synchronization in ClusterXL)
sk65103 (After reboot, state of cluster member is 'Down', and state of
'Synchronization' device is 'problem')
sk101695 (Cluster member is Down after reboot / policy installation / running
'cpstart')
Cluster member may fail to start correctly while the cluster is under severe load.
If a reboot (or 'cpstop' followed by 'cpstart') is performed on a cluster member while
the cluster is under severe load, the member may fail to start correctly.
The starting member will attempt to perform a Full Sync with the existing active
member(s) and may in the process use up all its resources and available memory.
This can lead to unexpected behaviour.
Procedure:
To overcome this problem, define the maximum amount of memory that the member
may use when starting up for synchronizing its connections with the active member. By
default, this amount is not limited. Estimate the amount of memory required as follows:
Number of open
100
Connections
1000
1.1 MB
6.9 MB
10 000
11 MB
69 MB
329 MB
20 000
21 MB
138 MB
657 MB
1305 MB
50 000
53 MB
345 MB
1642 MB
3264 MB
Note: These figures were derived for cluster members using the Windows platform,
with Pentium 4 processors running at 2.4 GHz.
Example:
If the cluster holds 10 000 connections, and the connection rate is 1000 connections per
second, then cluster administrator will need 69 MB for Full Sync.
Instructions:
Define the maximal limit for memory allocation to Full Sync by setting the value of the
global kernel parameter fw_sync_max_saved_buf_mem to the required number of
megabytes. Refer to sk26202 (Changing the kernel global parameters for Check Point
Security Gateway).
Impact:
If memory allocation reaches this limit during Full Sync, then further allocations are
forbidden, and relevant messages are printed into /var/log/messages file:
FW-1: fwlddist_save: WARNING: this member will not be fully synchronized !
FW-1: fwlddist_save: current delta sync memory during full sync has reached the
maximim of N MB
FW-1: fwlddist_save: it is possible to set a different limit by changing
fw_sync_max_saved_buf_mem value
Delta Sync
Refer to State Synchronization in ClusterXL section.
Refer to these solutions:
sk92909 (How to debug ClusterXL to understand why a connection is not
synchronized)
sk41827 (Synchronization network in the cluster is flooded with Sync Retransmit
packets)
Processing of Delta Sync packets during Full Sync
While performing Full Sync, the Delta Sync updates are not processed and saved.
Cluster member may fail to complete Full Sync while the cluster is under severe load.
It is possible that the rate of Delta Sync updates during the Full Sync process exceeds
the rate of the Full Sync packets. The FWD daemon on the Full Sync client member will not
be able to handle this number of Delta Sync packets because of the starvation of the user
space daemon, and Full Sync will never end.
Meanwhile, the Delta Sync packets are stored and occupy an ever-increasing amount of
memory in the kernel until memory allocation fails.
Procedure:
To overcome this problem, define the maximal limit for memory allocated to save the
Delta Sync packets during Full Sync. By default, this amount is not limited.
Instructions:
Define the maximal limit for memory allocated to save the Delta Sync packets during
Full Sync by setting the value of the global kernel parameter
fw_sync_max_saved_buf_mem to the required per cent of the memory allocated by
Check Point kernel (controlled by be kernel parameter fw_salloc_total_alloc) from
the overall allowed memory (controlled by be kernel parameter
fw_salloc_total_alloc_limit).
Impact:
After a certain amount of Delta Sync packets is received, no more Delta Sync packets
are accepted, so additional sync updates received during Full Sync are discarded, and
relevant messages are printed into /var/log/messages file:
FW-1: fwlddist_save: WARNING: this member will not be fully synchronized !
FW-1: fwlddist_save: reached the memory threshold.
FW-1: fwlddist_save: Current = X MB, allowed = Y MB, threshold = N%
A consequence of this is that connections that were not transferred during full sync will
not survive failover.
After Full Sync is complete, the Delta Sync packets stored during the Full Sync phase
are applied by order of arrival.
In order to deal with such potential bottleneck, ClusterXL monitors the Sync Sending
Queue - if the number of Delta Sync packets in this queue reaches the threshold, then:
1. The 'FW-1: State synchronization is in risk. Please examine
your synchronization network to avoid further problems' warning is
printed into /var/log/messages file
2. Member starts blocking new incoming connections
This threshold is controlled via kernel parameter fw_sync_buffer_threshold,
whose value is the maximal percentage of the buffer that may be filled before new
connections are blocked:
By default, this value it is set to 80, with a buffer size of 512 sync words.
By default, if more than 410 consecutive packets are sent without getting an
Acknowledgement on any one of them, new connections are blocked.
Refer to these solutions:
sk43896 (Blocking New Connections Under Load in ClusterXL)
sk82080 (/var/log/messages are filled with 'kernel: FW-1:
fwldbcast_update_block_new_conns: sync in risk: did not receive ack for the last
410 packets')
sk23695 ('FW-1: State synchronization is in risk. Please examine your
synchronization network to avoid further problems!' appears in /var/log/messages
file)
Traffic
Refer to these solutions:
sk43896 (Blocking New Connections Under Load in ClusterXL)
sk80520 (ClusterXL drops traffic with 'dropped by fwha_forw_run Reason: Failed to
send to another cluster member')
sk106425 (Connections through cluster to physical IP address of ClusterXL Standby
member / VRRP Backup member are dropped by Anti-Spoofing)
sk34668 (How to modify the assigned load between the members of ClusterXL in
Load Sharing Unicast mode)
sk93204 (Troubleshooting "Clear text packet should be encrypted" error in
ClusterXL)
sk31832 (How to prevent ClusterXL / VRRP / IPSO IP Clustering from hiding its own
traffic behind Virtual IP address)
sk34180 (Outgoing connections from cluster members are sent with cluster Virtual
IP address instead of member's Physical IP address)
sk42384 (Outgoing connections from cluster members are sent with member's
Physical IP address instead of cluster Virtual IP address)
sk37411 (Forwarding mechanism does not work properly on a machine with more
than 60 interfaces in a Nokia IP cluster)
sk31821 (Traffic that is sent to Secondary IP addresses / Alias IP addresses that
were defined on interfaces of ClusterXL members is not processed)
sk44084 (Kernel debug on ClusterXL Pivot member shows - FW-1:
fwha_pivot_forward_packet: can not forward since fwha_ether_addrs[dst=X][ifn=Y]
is NULL)
Flapping
Refer to ClusterXL definitions and terms section and to Cluster Control Protocol (CCP)
section.
If CCP packets are not received/sent within the expected timeouts, then eventually
either the problematic interface(s), or the whole member will be declared as failed. This in
turn (by design) will lead to the change in state of either the problematic interface(s), or the
whole member to 'Down'.
Depending on the configuration and the nature of the issue, the state might randomly
change between 'Up'/'Active' and 'Down'. Such random change in state is called "flapping"
(of either an interface, or a member).
Flapping, in its turn might cause an interruption in the production traffic that passes
through the cluster.
Cluster Under Load (CUL) mechanism (R75.40VS, R76, R77 and above) involves a
number of kernel parameters that allow cluster members to automatically monitor the CPU
utilization and prevent flapping according to the values of these kernel parameters - as
described in sk92723 (Cluster flapping prevention):
fwha_cul_mechanism_enable
fwha_cul_member_cpu_load_limit
fwha_cul_member_long_timeout
fwha_cul_cluster_short_timeout
fwha_cul_cluster_log_delay_millisec
fwha_cul_policy_freeze_timeout_millisec
fwha_cul_policy_freeze_event_timeout_millisec
Refer to these solutions:
sk43984 (Interface flapping when cluster interfaces are connected through several
switches)
sk93454 (Increasing ClusterXL dead timeout)
sk97827 (How to change ClusterXL Interface Monitoring Timeouts)
sk62570 (How to troubleshoot failovers in ClusterXL - Advanced Guide)
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'Monitoring and Troubleshooting Gateway Clusters' Monitoring Synchronization (fw ctl pstat).
Refer to this solution:
sk34476 (ClusterXL Sync Statistics - output of 'fw ctl pstat' command).
Example:
Sync:
Version: new
Status: Able to Send/Receive sync packets
Sync packets sent:
total : 466729198, retransmitted : 1305, retrans reqs : 89, acks : 809
Sync packets received:
total : 77283541, were queued : 6715, dropped by net : 6079
retrans reqs : 37462, received 175 acks
retrans reqs for illegal seq : 0
dropped updates as a result of sync overload: 0
Delta Sync memory usage: currently using XX KB mem
Callback statistics: handled 138 cb, average delay : 2, max delay : 34
Number of Pending packets currently held: 1
Packets released due to timeout: 18
Explanations:
Output section
Sync: off
Sync:
Live connections update: on
Sync:
Version: old
Sync:
Version: new
Sync:
Version: new
Status: Able to Send/Receive sync
packets
Sync:
Version: new
Status:
Able to send sync packets
Unable to receive sync packets
Sync:
Version: new
Status:
Unable to send sync packets
Unable to receive sync packets
Sync:
Version: new
Status:
Able to send sync packets
Saving incoming sync packets
Sync:
Version: new
Status:
Unable to send sync packets
Saving incoming sync packets
Sync:
Version: new
Status:
Able to send sync packets
Able to receive sync packets
Sync:
Version: new
Status:
Unable to send sync packets
Able to receive sync packets
Sync packets sent:
total : 466729198, retransmitted :
1305, retrans reqs : 89, acks : 809
Explanation
Delta Sync is disabled: either Full Sync
failed, or Delta Sync was disabled by cluster
administrator.
'Active Mode' tab is opened in SmartView
Tracker. Refer to sk30908.
Check Point FW-1 v4.1 and lower.
Check Point FW-1 NG and above.
Delta Sync works correctly.
: 6715,
175
0
sync
TOTAL
'cphaprob' command
Refer to ClusterXL definitions and terms section.
Description:
Use the 'cphaprob' command to verify that the cluster and the cluster members are
working properly, and to define critical devices.
Syntax:
[Expert@Member]# cphaprob [flags]
Note: The commands below are listed in the order to their importance / relevance.
cphaprob state
Description:
Prints the summary with the following information:
o Cluster Mode
o Member ID of each known member
o Assigned traffic load for each known member
o State of each known member
Syntax:
[Expert@Member]# cphaprob state
Example:
[Expert@FW2-Member:0]# cphaprob state
Cluster Mode:
Number
Unique Address
Assigned Load
State
1
2 (local)
10.10.10.31
10.10.10.32
0%
100%
Standby
Active
[Expert@FW2-Member:0]#
Commands:
cphaprob list
In R77.30 and above
When there are no issues on the cluster
member:
Name:
Name:
Name:
Name:
Name:
Name:
Synchronization
Filter
fwd
cphad
cvpnd
routed
cphaprob -l list
In R77.30 and above
Prints the list of all the "Built-in Devices"
and the "Registered Devices" - exactly as
"cphaprob -ia list" does in R77.20 and
lower
Device Name: Problem
Notification
Device Name: Interface Active
Check
Device Name: HA Initialization
Device Name: Load Balancing
Configuration
Device Name: Recovery Delay
Device
Device
Device
Device
Device
Device
Name:
Name:
Name:
Name:
Name:
Name:
Synchronization
Filter
routed
cphad
fwd
cvpnd
cphaprob -i list
In R77.30 and above
When there are no issues on the cluster
member:
There are no pnotes in problem
state
* Issue 'cphaprob -l list' to
show full list of pnotes
When a critical device reports a problem prints only the critical device that reports
its state as "problem"
Example:
Registered Devices:
Device Name: routed
Registration number: 2
Timeout: none
Current state: problem
Time since last report: 2.8 sec
Name:
Name:
Name:
Name:
Name:
Name:
Synchronization
Filter
fwd
cphad
cvpnd
routed
Name:
Name:
Name:
Name:
Name:
Name:
Synchronization
Filter
routed
cphad
fwd
cvpnd
cphaprob -e list
In R77.30 and above
When there are no issues on the cluster
member:
Name:
Name:
Name:
Name:
Name:
Name:
Synchronization
Filter
fwd
cphad
cvpnd
routed
Registered Devices:
Device Name: routed
Registration number: 2
Timeout: none
Current state: problem
Time since last report: 2.8 sec
cphaprob [-a] if
Description:
Prints the summary of cluster interfaces with the following information:
o Number of required cluster interfaces - including the Sync interfaces (the
maximal number of good cluster interfaces seen since the last reboot)
o Number of required secured (trusted) interfaces (the maximal number of good
sync interfaces seen since the last reboot)
o Names of monitored cluster interfaces (refer to CCP and VLAN interfaces)
o State of cluster interfaces (based on arrival/transmission of CCP packets)
o CCP mode on cluster interfaces
o Number of cluster Virtual IP addresses
o Virtual IP addresses
o Virtual MAC addresses (if VMAC mode is enabled per sk50840)
Syntax:
[Expert@Member]# cphaprob [-a] if
Flag
-a
Description
Prints Virtual IP addresses and
their corresponding interfaces.
Example:
[Expert@FW2-Member:0]# cphaprob -a if
Required interfaces: 3
Required secured interfaces: 1
eth0
eth1
eth2
UP
UP
UP
192.168.204.33
20.20.20.33
[Expert@FW2-Member:0]#
Flag
-reset
Description
Resets the statistics in kernel that was
collected since boot, or last reset.
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'Monitoring and Troubleshooting Gateway Clusters'
- Troubleshooting Synchronization.
Refer to these solutions:
o sk34475 (ClusterXL Sync Statistics - output of 'cphaprob syncstat' command)
o sk82080 (/var/log/messages are filled with 'kernel: FW-1:
fwldbcast_update_block_new_conns: sync in risk: did not receive ack for the last
410 packets')
Example:
Sync Statistics (IDs of F&A Peers - 1 2 3 4 5 6 7 ):
Other Member Updates:
Sent retransmission requests...................
Avg missing updates per request................
Old or too-new arriving updates................
Unsynced missing updates.......................
Lost sync connection (num of events)...........
Timed out sync connection .....................
165
1
5661
0
4354
1
Local Updates:
Total generated updates .......................
Recv Retransmission requests...................
Recv Duplicate Retrans request.................
9180670
1073
2564
Blocking Events................................
Blocked packets................................
Max length of sending queue....................
Avg length of sending queue....................
Hold Pkts events...............................
Unhold Pkt events..............................
Not held due to no members.....................
Max held duration (sync ticks).................
0
0
4598
0
1
1
16
0
11
Timers:
Sync tick (ms).................................
CPHA tick (ms).................................
100
100
Queues:
Sending queue size.............................
Receiving queue size...........................
512
256
Output section
IDs of
F&A
Peers
Other
Member
Updates:
Sent
retransmission
requests
Avg missing
updates per
request
Old or too-
new
arriving
updates
Unsynced
missing
updates
Lost sync
connection
(num of
events)
Explanation
The F&A (Flush and Ack) peers are
the cluster members that this
member recognizes as being part
of the cluster.
The IDs correspond to IDs and
IP addresses shown by the
'cphaprob state' command.
The statistics in this section relate
to Delta Sync updates generated
by other cluster members, or to
Delta Sync updates that were not
received from the other members.
Updates inform about changes in
the connections handled by the
cluster member, and are sent from
and to members. Updates are
identified by sequence numbers.
The number of
retransmission requests, which
were sent by this member.
Retransmission requests are sent
when certain packets (with
a specified sequence number)
are missing, while the sending
member already received updates
with advanced sequences.
Each retransmission request can
contain up to 32 missing
consecutive sequences.
The value of this field is the
average number of requested
sequences per
retransmission request.
The number of arriving Delta Sync
updates where the sequence
number is too low, which implies it
belongs to an old transmission, or
too high, to the extent that it cannot
belong to a new transmission.
The number of missing Delta Sync
updates, for which the receiving
member stopped waiting. It stops
waiting when the difference in
sequence numbers between the
newly arriving updates and the
missing updates is larger than the
length of the "Receiving
Queue".
The number of events, in
which synchronization with another
member was lost and regained due
to either Security Policy installation
Limits
Timed out
sync
connection
Local
Updates:
Total
generated
updates
Recv
Retransmission
requests
Recv
Duplicate
Retrans
request
Blocking Events
Blocked
packets
Max length
of sending
queue
Avg length of
sending
queue
Hold
Pkts
events
Unhold
Pkt
events
Max held
duration
(sync
ticks)
Avg held
duration
(sync
ticks)
Timers:
Sync tick
(ms)
CPHA tick
(ms)
Queues:
Sending
queue size
Should be 0 - positive
value indicates connectivity
problem between the members.
Flags
-d device
-t timeout_in_sec
-s <ok|init|problem>
-p
-g
Description
Specifies the name of the Pnote (refer to
ClusterXL definitions and terms section).
Specifies how frequently the periodic reports are
expected.
If no periodic reports should be expected, then
enter 0 (zero).
Specifies the initial state with which
the Pnote will be registered.
(Optional) Specifies that this Pnote
must be registered permanently (this
configuration will be saved in the
$FWDIR/conf/cphaprob.conf file).
(Optional) Specifies that this Pnote
must be registered globally (applies to
R75.40VS and above in VSX mode).
Flags
-d device
-p
-g
Description
Specifies the name of the Pnote (refer to
ClusterXL definitions and terms section).
(Optional) Specifies that this Pnote
must be unregistered permanently (this
configuration will be removed from the
$FWDIR/conf/cphaprob.conf file).
(Optional) Specifies that this Pnote
must be unregistered globally (applies to
R75.40VS and above in VSX mode).
Flags
-d device
-s <ok|init|problem>
-g
Description
Specifies the name of the Pnote (refer to
ClusterXL definitions and terms section).
Specifies the state, which
will be reported for the Pnote .
(Optional) Specifies that this Pnote
state must be reported globally
(applies to R75.40VS and above in VSX mode).
Flags
-f file
-g
Description
Specifies the file that contains the list of Pnotes
and their parameters.
For file syntax, refer to the
$FWDIR/conf/cphaprob.conf file.
(Optional) Specifies that this Pnote
must be registered globally (applies to
R75.40VS and above in VSX mode).
Flags
-a
-g
Description
Specifies that all Pnotes must be unregistered.
(Optional) Specifies that all Pnotes must be
unregistered globally (applies to
R75.40VS and above in VSX mode).
cphaprob igmp
Description:
Prints IGMP membership status.
Syntax:
[Expert@Member]# cphaprob igmp
Example:
[Expert@FW2-Member:0]# cphaprob igmp
IGMP Membership: Enabled
Supported Version: 2
Report Interval [sec]: 60
IGMP queries are replied only by Operating System
Interface
Host Group
Multicast Address Last ver. Last Query[sec]
--------------------------------------------------------------------------eth0
224.168.204.33 01:00:5e:28:cc:21
N/A
N/A
eth1
224.10.10.250
01:00:5e:0a:0a:fa
N/A
N/A
eth2
224.20.20.33
01:00:5e:14:14:21
N/A
N/A
[Expert@FW2-Member:0]#
Flag
-reset
Description
Resets the statistics in kernel that was
collected since boot, or last reset.
Example:
[Expert@FW2-Member:0]# cphaprob ldstat
Operand
Calls
Bytes
Average Ratio %
------------------------------------------------------ERROR
0 0
0
0
SET
5287
1359896 257
27
RENAME
0 0
0
0
REFRESH
41105
2137460 52
42
DELETE
5276
189792 35
3
SLINK
10496
671744 64
13
UNLINK
0 0
0
0
MODIFYFIELDS
8032
610432 76
12
RECORD DATA CONN
0 0
0
0
COMPLETE DATA CONN
0 52026
0
1
Total bytes sent: 4893244 (4 MB) in 52026 packets. Average 94
[Expert@FW2-Member:0]#
cphaprob ucfstat
Description:
Prints the Full Connectivity Upgrade (FCU) statistics on the member that is being
upgraded in Full Connectivity mode.
Note: FCU is not supported since R75 (refer to sk107042).
Syntax:
[Expert@Member]# cphaprob fcustat
Example:
[Expert@FW2-Member:0]# cphaprob fcustat
During FCU....................... yes
Number of connection modules..... 23
Connection module map (remote -->local)
0 --> 0 (Accounting)
1 --> 1 (Authentication)
2 --> 3 (NAT)
3 --> 4 (SeqVerifier)
4 --> 5 (SynDefender)
5 --> 6 (Tcpstreaming)
6 --> 7 (VPN)
Table id map (remote->local)..... (none or a specific list,
depending on configuration)
Table handlers ..................
78 --> 0xF98EFFD0 (sip_state)
8158 --> 0xF9872070 (connections)
Global handlers ................. none
[Expert@FW2-Member:0]#
Output section
During FCU
Table id map
Table handlers
Global handlers
Explanation
This should be "yes" only after
running the 'fw fcu' command and
before running 'cphastop' on the
final old member.
In all other cases it should be "no".
Safe to ignore.
The output reveals a translation
map from the old member to the
new member.
For additional information, refer to
'Full Connectivity Upgrade
Limitations' in the Installation and
Upgrade Guide.
This shows the mapping between
the gateway's kernel table indices
on the old member and on the NM.
Having a translation is not
mandatory.
This should include a sip_state
and connection table handlers.
Security Gateway configuration (in
VSX, applies to R75.40VS and
above ), a VPN handler should also
be included.
Reserved for future use.
cphaprob tablestat
Description:
Prints the Cluster tables.
Syntax:
[Expert@Member]# cphaprob tablestat
Example:
[Expert@FW2-Member:0]# cphaprob tablestat
----
----
Member
Interface
IP-Address
-----------------------------------------0
1
192.168.204.31
0
2
10.10.10.31
0
3
20.20.20.31
(Local)
1
1
1
1
2
3
192.168.204.32
10.10.10.32
20.20.20.32
------------------------------------------
cphastart
o
o
o
o
cphastop
o Running cphastop on a cluster member stops the cluster member from passing
traffic.
o State Synchronization also stops.
o It is still possible to open connections directly to the cluster member.
o In High Availability Legacy mode, running cphastop may cause the entire
cluster to stop functioning.
'cphaconf' command
Important Note: This command should NOT normally be used, since configuration is
controlled by the Management Server. Use it only if specifically instructed to by Check
Point Support. Exception: when working with Bond interfaces.
Refer to ClusterXL definitions and terms section.
Refer to ClusterXL Administration Guide (R70, R70.1, R71, R75, R75.20, R75.40,
R75.40VS, R76, R77) - Chapter 'Monitoring and Troubleshooting Gateway Clusters' ClusterXL Configuration Commands - The cphaconf command.
Note: Starting in R77.20, refer to $FWDIR/log/cphaconf.elg
Note: The commands below are listed in the order to their importance / relevance.
Description:
Loads cluster configuration with relevant options into kernel.
Flags:
Flags
-D
-c <size>
-i <ID>
-n <ID>
Description
Prints debug information about
the execution of 'cphaconf' command
Sets cluster size
(number of members in the cluster)
Sets member ID of the local machine
(count is starts from 1
Sets cluster ID
-p
-m
-m
-m
-m
<policy_id>
<1|service>
<2|balance>
<3|primary-up>
<4|active-up>
-R a
-R <required_IF_num>
-t <secured_IF_1>
<secured_IF_2>
...
-d <disconnected_IF_1>
<disconnected_IF_2>
...
-A
-M <0|multicast>
-M <1|pivot>
-l
-l
-l
-l
-l
-l
-l
-l
0
1
2
3
4
5
6
7
-S 0
-S 1
-f 0
-f 1
-f 2
-o
-x
-z 0
-z 1
-v
-V
-T 0
-T 1
-T 2
-r
-s
cphaconf stop
Description:
Removes the cluster configuration from kernel.
Background:
The 'cphastop' command is actually a shell script wrapper that runs this command.
cphaconf debug_data
Description:
Prints the current cluster configuration as loaded in the kernel on this machine.
Note:
Works only during the following cluster debug:
In 1st shell:
[Expert@Member_HostName]#
[Expert@Member_HostName]#
[Expert@Member_HostName]#
[Expert@Member_HostName]#
fw
fw
fw
fw
ctl
ctl
ctl
ctl
debug 0
debug -buf 32000
debug -m cluster + conf
kdebug -T -f > /var/log/debug.txt
In 2nd shell:
[Expert@Member_HostName]# cphaconf debug_data
In 1st shell:
[Expert@Member_HostName]# fw ctl debug 0
Review /var/log/debug.txt
Example:
Configuration:
Number
Unique Address
Assigned Load
State
1 (local)
2
10.10.10.31
10.10.10.32
100%
0%
Active
Standby
[Expert@FW1-Member:0]#
[Expert@FW1-Member:0]# cphaprob -a if
Required interfaces: 3
Required secured interfaces: 1
eth0
eth1
eth2
UP
UP
UP
192.168.204.33
20.20.20.33
[Expert@FW1-Member:0]#
Debug output:
;[cpu_1];[fw4_0];================================================;
;[cpu_1];[fw4_0];=====
ClusterXL
debug
information
===;
;[cpu_1];[fw4_0];================================================
;
;[cpu_1];[fw4_0];---- Sync ---;
;[cpu_1];[fw4_0];fwlddist_state is (1a): Receiving, Not Saving, Sending;
;[cpu_1];[fw4_0];fwlddist_dobcast is: 1;
;[cpu_1];[fw4_0];fw_has_nondefault_filter is: 1;
;[cpu_1];[fw4_0];fw_syncn_is_configured is: 1;
;[cpu_1];[fw4_0];fwlddist_policy_in_ready_state is: 1;
;[cpu_1];[fw4_0];---VMAC mode:
---;
;[cpu_1];[fw4_0];VMAC: vmac mode is enabled;
;[cpu_1];[fw4_0];VMAC: the vmac of each interface:;
;[cpu_1];[fw4_0];Interface: 1) eth0, vmac: 00:1C:7F:00:00:FE;
;[cpu_1];[fw4_0];Interface: 3) eth2, vmac: 00:1C:7F:00:00:FE;
;[cpu_1];[fw4_0];VMAC: priomisc mode interfaces (by the VMAC mechanism) are:;
;[cpu_1];[fw4_0];Interface: 1) eth0, vmac_index=0x0;
;[cpu_1];[fw4_0];Interface: 3) eth2, vmac_index=0x0;
;[cpu_1];[fw4_0];------------------------
;
;[cpu_1];[fw4_0];---Interfaces info:
---;
;[cpu_1];[fw4_0];0) if: lo, flags: 0x800;
;[cpu_1];[fw4_0];1) if: eth0, flags: 0x10000800;
;[cpu_1];[fw4_0];2) if: eth1, flags: 0x10000808;
;[cpu_1];[fw4_0];3) if: eth2, flags: 0x10000800;
;[cpu_1];[fw4_0];-----------------------;
;[cpu_1];[fw4_0];================================================;
;[cpu_1];[fw4_0];=====
ClusterXL
debug
end
===;
;[cpu_1];[fw4_0];================================================
;
;[cpu_1];[fw4_1];================================================;
;[cpu_1];[fw4_1];=====
ClusterXL
debug
information
===;
;[cpu_1];[fw4_1];================================================
;
;[cpu_1];[fw4_1];-----------------------;[cpu_1];[fw4_1];=====
Cluster instance information
===;
;[cpu_1];[fw4_1];-----------------------;
;[cpu_1];[fw4_1];---Selection table ---;
;[cpu_1];[fw4_1];Effective selection table size: 2
;
;[cpu_1];[fw4_1];0: 0;
;[cpu_1];[fw4_1];1: 0;
;[cpu_1];[fw4_1];-----------------------;
;[cpu_1];[fw4_1];---- Multicast table ---;
;[cpu_1];[fw4_1];lo: Address:
1.0.0.127;
;[cpu_1];[fw4_1];Cluster/default multicast IP: 0.0.0.0, MAC address:
00:00:00:00:00:00;
;[cpu_1];[fw4_1];eth0: Address:
31.204.168.192;
;[cpu_1];[fw4_1];Cluster/default multicast IP: 33.204.168.192, MAC address:
01:00:5E:28:CC:21;
;[cpu_1];[fw4_1];eth1: Address:
31.10.10.10;
;[cpu_1];[fw4_1];Cluster/default multicast IP: 250.10.10.10, MAC address:
01:00:5E:0A:0A:FA;
;[cpu_1];[fw4_1];eth2: Address:
31.20.20.20;
;[cpu_1];[fw4_1];Cluster/default multicast IP: 33.20.20.20, MAC address:
01:00:5E:14:14:21;
;[cpu_1];[fw4_1];-----------------------;
;[cpu_1];[fw4_1];---- Status subscribers ---;
;[cpu_1];[fw4_1];Subscriber: 0 pid 23079 sig 12 desc pepd;
;[cpu_1];[fw4_1];Subscriber: 1 pid 23078 sig 12 desc pdpd;
;[cpu_1];[fw4_1];Subscriber: 2 pid 25236 sig 3 desc routed instance 0;
;[cpu_1];[fw4_1];Subscriber: 3 pid 25270 sig 12 desc ted;
;[cpu_1];[fw4_1];Subscriber: 4 pid 4533 sig 12 desc cvpnd;
;[cpu_1];[fw4_1];-----------------------;
;[cpu_1];[fw4_1];=====
Cluster instance information end
;[cpu_1];[fw4_1];------------------------
===;
cphaconf -t <secured_IF_1><secured_IF_2>...
add
Description:
Adds the specified trusted (secured) interfaces explicitly into the current cluster
configuration in kernel.
cphaconf sync
Description:
Sets sync configuration in kernel (in HA New mode).
cphaconf stop_all_vs
Description:
Stops clustering on each Virtual System (relevant only for VSX systems).
cphaconf clear-secured
Description:
Clears the list of secured (trusted) interfaces in kernel.
cphaconf clear-disconnected
Description:
Clears the list of disconnected interfaces in kernel.
Refer to Defining 'Disconnected' interfaces section.
cphaconf clear_subs
Description:
Clears the list of subscribers.
Note:
List of such subscribers can be obtained by running the cphaconf debug_data
command.
cphaconf mc_reload
Description:
Updates the multicast configuration by reloading the 'cphamcset' daemon (if this is
HA New mode and CCP is set to run in Multicast mode). The current configuration is
kept.
cphaconf uninstall_macs
Description:
Calls the $FWDIR/bin/cpha_restore_macs script to remove the cluster MAC
address configuration (and restore a previous MAC configuration if it was saved on
Linux-based OS to the ifcfg-ethX file).
cphaconf macs
Description:
Only on IPSO OS: Sets Multicast MAC addresses on relevant interfaces.
cphaconf init
Description:
Initializes cluster configuration.
cphaconf fini
Description:
Finalizes cluster configuration.
'cpstat' command
Description:
Produces relevant information for the installed products.
Syntax:
[Expert@HostName]# cpstat [-d] [-s SIC_Name] [-p port] [-o
polling_interval [-c count] [-e period]] [-f flavour]
application_flag
Flags:
'cpstat' flags
-d
s <SIC_Name>
-p <port>
-o <polling_interval
>
-c <count>
-e <period>
Description
Prints some debug information about the
execution of 'cpstat' command
Sets the SIC name of the AMON server
Sets the port number of the AMON server
(default port is 18192)
Sets polling interval (in seconds)- how
frequently to produce the output (default is 0,
i.e., the results are shown only once)
Sets how many times in total to produce the
output(default is 0, i.e., the results are shown
repeatedly)
Sets the interval, over which "statistical" OIDs
are computed (ignored for regular OIDSs)
-f <flavour
>
application_flag
In our case, we are interested in the information only about the ClusterXL product:
[Expert@HostName]# cpstat -f default ha
[Expert@HostName]# cpstat -f all ha
Refer to sk93201 (Output of 'cpstat -f all ha' command on Gaia OS does not populate
the 'Cluster IPs table' and the 'Sync table').
The 'cpstat -f all ha' command on Gaia OS and on 3rd party / OPSec clusters
works in the following way:
1. The 'cpstat -f all ha' command calls the
$FWDIR/bin/cxl_create_partner_topology_file shell script.
2. The $FWDIR/bin/cxl_create_partner_topology_file shell script collects
the relevant information and saves in the
$FWDIR/tmp/cxl_partner_topology_config.txt file.
3. 'cpstat -f all ha' uses the information in
$FWDIR/tmp/cxl_partner_topology_config.txt file and populates the
'Cluster IPs table' and the 'Sync table'.
Examples:
[Expert@Member]# cpstat -f default ha
Product name:
Version:
Status:
HA installed:
Working mode:
HA started:
High Availability
N/A
OK
1
High Availability (Active Up)
yes
Interface table
---------------------------------------------------------------|Name|IP
|Status
|Verified|Trusted|Shared|Netmask|
---------------------------------------------------------------|eth0|172.30.41.78|Up
|
0|
0|
2|0.0.0.0|
|eth1| 10.10.10.78|Up
|
300|
1|
2|0.0.0.0|
|eth2| 20.20.20.78|Up
|
300|
0|
2|0.0.0.0|
|eth3| 30.30.30.78|Disconnected|21318100|
0|
2|0.0.0.0|
|eth4| 40.40.40.78|Disconnected|21318100|
0|
2|0.0.0.0|
---------------------------------------------------------------Problem Notification table
-----------------------------------------------|Name
|Status|Priority|Verified|Descr|
-----------------------------------------------|Synchronization|OK
|
0| 168880|
|
|Filter
|OK
|
0|
21318|
|
|cphad
|OK
|
0|
21318|
|
|fwd
|OK
|
0| 168949|
|
|routed
|OK
|
0|
21307|
|
|cvpnd
|OK
|
0|
1|
|
|ted
|OK
|
0|
1|
|
-----------------------------------------------Cluster IPs table
--------------------------------------------------------------|Name|IP
|Netmask
|Member Network|Member Netmask|
--------------------------------------------------------------|eth0|172.30.41.79| 255.255.0.0|
172.30.0.0|
255.255.0.0|
|eth2| 20.20.20.79|255.255.255.0|
20.20.20.0| 255.255.255.0|
--------------------------------------------------------------Sync table
-------------------------------|Name|IP
|Netmask
|
-------------------------------|eth1|10.10.10.78|255.255.255.0|
--------------------------------
$FWDIR/bin/clusterXL_admin script
This shell script registers a pnote (called 'admin_down') and gracefully changes the
state of the given cluster member to 'Down' (by reporting the state of that pnote as
'problem'), or gracefully reverts the state of the given cluster member to 'Up' (by reporting
the state of that pnote as 'ok').
Refer to sk55081 (Best practice for manual fail-over in ClusterXL).
$FWDIR/bin/clusterXL_monitor_ipsscript
This shell script pings a list of predefined IP addresses and changes the state of the
given cluster member to 'Down' or 'Up' based on the replies to these pings.
Note: Cluster member will go down even if one ping is not answered.
Refer to sk35780 (How to configure $FWDIR/bin/clusterXL_monitor_ips script to run
automatically on Gaia / SecurePlatform OS).
$FWDIR/bin/clusterXL_monitor_processscript
This shell script monitors a list of predefined processes and changes the state of the
given cluster member to 'Down' or 'Up' based on whether these processes are running or
not.
Refer to sk92904 (How to configure $FWDIR/bin/clusterXL_monitor_process script to
run automatically on Gaia / SecurePlatform OS).
ClusterXL Debugging
Debugging Check Point Security Gateway
In order to see how the Security Gateway processes the traffic, and how the internal
components are working, a debug of Check Point kernel should be run on this Security
Gateway (depending on the issue, it might also be required to run a debug of the relevant
user space daemon - e.g., in case of VPN - vpnd, in case of Full Sync - fwd).
Some debugs print so much information, that the load on CPU might increase to 100%
and render the Security Gateway unresponsive.
Note: It is always recommended to run the kernel debug during a scheduled
maintenance window in order to minimize the impact on production traffic and on users.
Syntax
[Expert@GW_HostName]# fw ctl debug -h
fw ctl debug [-d <strings>] [-s "<string>"] [-v ("<VSIDs>"|all)] [-k] [-x] [-m
<module>] [-e expr |-i <filter-file|-> | -u] [+|-] <options | all | 0>
Or: fw ctl debug [-t (NONE|ERR|WRN|NOTICE|INFO)] [-f (RARE|COMMON)]
Or: fw ctl debug -buf [buffer size][-v ("<VSIDs>"|all)][-k]
-h - for help
-e - Set debug filter to expr (inspect script)
-i - Set debug filter from filter-file (- is the standard input)
-u - Unset debug filtering
To display all kernel debugging modules and all their flags that this machine supports:
[Expert@GW_HostName]# fw ctl debug -m
To display all kernel debugging modules and their flags that were turned on:
[Expert@GW_HostName]# fw ctl debug
To display all debugging flags that were turned on for this kernel debugging module:
[Expert@GW_HostName]# fw ctl debug -m MODULE
Notes:
Some debug flags are enabled by default (error, warning) in various kernel
debugging modules, so that some generic messages are printed into Operating
System log (Linux-based OS: /var/log/messages; Windows OS: Event
Viewer).
This command should be issued before starting any kernel debug.
This command must be issued to stop the kernel debug.
Note:
This unsets all debug flags, which means that none of the relevant messages will be
printed. Default debug flags should be enabled.
To set kernel debugging buffer:
[Expert@GW_HostName]# fw ctl debug -buf 32000
Notes:
Default size of the debugging buffer is 50 KB
Maximal size of the debugging buffer is 32768 KB
Unless the size of the debugging buffer is increased from default 50 KB, the
debug will not be redirected to a file (debug messages will be printed into
Operating System log)
Debug messages are collected in this buffer, and a user space process
($FWDIR/bin/fw) collects them and prints into the output file.
To print debug messages into the output file (start the kernel debug):
[Expert@GW_HostName]# fw ctl kdebug -T -f > /var/log/debug.txt
Note:
If you need to use this command in shell scripts, then add an ampersand at the end
to run the command in the background (fw ctl kdebug -T -f >
/var/log/debug.txt &).
To stop the kernel debug:
Press CTRL+C and set the default kernel debug options
[Expert@GW_HostName]# fw ctl debug 0
Note:
If you started the kernel debug via shell script, then you should just set the default
kernel debug options.
Note: Refer to VSX NGX R65 Administration Guide - 'Per Virtual System
Debugging'.
In R75.40VS and above in VSX mode, you have to switch to the context of the
specific Virtual Device, and then run the usual debugging commands:
[Expert@VSX_HostName:0]# vsenv VSID
[Expert@VSX_HostName:VSID]# fw ctl debug ...
Note: Any other message means that there was a problem allocating the buffer,
and you should not continue until that issue is resolved (e.g., "Failed to
allocate kernel debugging buffer").
C. Set relevant kernel debug flags in relevant kernel debugging modules:
[Expert@GW_HostName]# fw ctl debug -m MODULE + FLAG1 FLAG2 ... FLAGn
or
[Expert@GW_HostName]# fw ctl debug -m MODULE all
Note: Pay close attention to the name of the kernel debug module.
2. Verify the kernel debug options:
[Expert@GW_HostName]# fw ctl debug -m MODULE
Notes:
Pay close attention to the size of the kernel debugging buffer.
Pay close attention to the name of the kernel debugging module.
The order of the flags in this output does not matter - just all the flags you set
have to be here.
Disable this kernel parameter to disable the limit on the debug messages time
window (default - 60 ; zero - disables the limit):
[Expert@Member_HostName]# fw ctl set int fw_kdprintf_limit_time 0
Disable this kernel parameter to disable the limit on the amount of debug messages
(default - 30 ; zero - disables the limit) that are printed within specified time
(fw_kdprintf_limit_time):
[Expert@Member_HostName]# fw ctl set int fw_kdprintf_limit 0
Set this kernel parameter to print additional IO information and the contents of the
packets in HEX format when 'select' flag is enabled in 'cluster' module:
[Expert@Member_HostName]# fw ctl set int fwha_dprint_io 1
Set this kernel parameter to print additional information about cluster interfaces
when 'if' flag is enabled in 'cluster' module (very helpful for Check Point RnD):
[Expert@Member_HostName]# fw ctl set int fwha_dprint_all_net_check 1
Set this kernel parameter to print the dump of each packet when 'packet' flag is
enabled in 'fw' module (very helpful for Check Point RnD):
[Expert@Member_HostName]# fw ctl set int fw_debug_dump_packet 1
Notes:
o This parameter is available in R75.40VS, R76 and above
o Enabling the debug with flag 'packet' creates high load on CPU
o Enabling the parameter 'fw_debug_dump_packet' creates high load on CPU
Explanation
chainfwd
highavail
cluster configuration
ioctl
mrtsync
nat
sync
xlate
xltrc
Explanation
accel
ccp
arrival/transmission of Cluster
Control Protocol (CCP) packets
conf
cu
Connectivity Upgrade
(only since R77.20)
df
drop
forward
if
log
mac
nokia
pivot
pnote
select
stat
subs
timer
Explanation
Correction Layer
bstat
Blade State
ch_ccp
Chassis CCP
ch_conf
Chassis configuration
ch_stat
Chassis State
iterator
Iterator
osp
smo
unisync
vpn
Unicast Sync
VPN traffic
Some kernel parameters can be set on-the-fly with 'fw ctl set int PARAMETER
VALUE' command (e.g., fwha_mac_magic).
Note: This change does not survive reboot.
Some kernel parameters can be set only during boot of the machine (any parameter
that controls memory allocation, sizes of memory buffers).
Refer to the solutions that contain most relevant cluster-related kernel parameters:
sk92723 (Cluster flapping prevention)
sk25977 (Connecting multiple clusters to the same network segment (same VLAN,
same switch)
sk23695 ('FW-1: State synchronization is in risk. Please examine your
synchronization network to avoid further problems!' appears in /var/log/messages
file)
sk43984 (Interface flapping when cluster interfaces are connected through several
switches)
sk31655 (State of Standby cluster member in High Availability cluster is constantly
changing between 'Standby' and 'Down')
sk31336 (Using Monitor Interface Link State feature to improve ClusterXL interfacefailure-detection ability)
sk62863 (ClusterXL - cluster debug shows interface flapping due to the missing CCP
packets)
sk63163 (Failover does not occur in ClusterXL HA Primary Up mode after changing
cluster member priorities and installing the policy)
sk41827 (Synchronization network in the cluster is flooded with Sync Retransmit
packets)
sk43896 (Blocking New Connections Under Load in ClusterXL)
sk82080 (/var/log/messages are filled with 'kernel: FW-1:
fwldbcast_update_block_new_conns: sync in risk: did not receive ack for the last
410 packets')
sk43872 (ClusterXL - CCP packets and fwha_timer_cpha_res parameter)
sk41471 (ClusterXL - State Synchronization time interval and 'fwha_timer_sync_res'
kernel parameter)
sk31934 (ClusterXL IGMP Membership)
sk95156 (How to control the synchronization of multicast routes in Check Point
cluster)
sk104567 (Traffic passing through the VSX cluster is lost during a cluster failure on
Standby member)