Documente Academic
Documente Profesional
Documente Cultură
This document describes how to configure load balancing on Nokia IPSO 3.6 FCS4 with
clustering, and Check Point NG FP3 in a step-by-step process. Any statements contained herein
are those of the author and in no way reflect the views of Nokia Internet Communications or
Check Point Software Technologies.
By:
Jeff Mousseau
Tel. +1-416-829-5753
Fax +1-416-829-5553
Email: jeff@digitalmigrations.com
Environment
- 2 x Nokia IP330 (256 MB RAM recommended)
- 1 P4 PC 1.4GHz with 256 MB RAM running the NG Management Server and GUI
Client
Gotcha(s)
Please note that additional hotfix(es) are required for the cluster to function properly.
They are not listed here since they are released periodically. Hotfixes and their
installation are detailed on the software downloads page at https://support.nokia.com/.
Details
Clustering in IPSO 3.6 is already present and is free to use – there is no license
required.
Supported platforms: IP330, 440, 530, 650, 710, and 740. Should one be tempted to
install clustering on IP110s, in addition to not being supported, some clustering functions
will not operate properly.
Supported interfaces: Currently only 10/100 ethernet interfaces and GigE interfaces
are supported. A design issue is that the switch and cluster sync interface(s) need to be
able to handle multicast traffic at the rate presented to the cluster. Thus, if one is using
GigE interfaces, the switch will need to be high-performance and the sync interface(s)
should be 100 Mbit full-duplex (flow control turned on). The traffic send is a mix of multi
and unicast traffic.
On the configuration issues static ARP entries may be required on some routers should
they have difficulty with ARP replies with multicast addresses. Please see the section on
routers at the end of this paper.
Unofficial performance scaling (IP740 sited. More modest scaling on smaller platforms):
Total Nodes Performance Scaling
FW 2 1.5
3 1.7
4 1.8
As you see from the chart, VPN scaling is excellent at approximately 80 – 90 per cent.
Firewalling is much less perfect. This is due to NG’s robust state sync protocol, which
requires far more overhead. Firewall performance scaling will be addressed in the next
version of IPSO.
The following dynamic and multicast routing protocols are not supported when clustering
is enabled: BGP, OSPF, RIP, IGRP, IGMP, PIM, DVMRP. This will be addressed in a
later version with unicast cluster.
The Nokia clustering synchronization is configured in Voyager on the Cluster web page.
Check Point NG synchronization is configured in the cluster object. In this document we
will configure both on the same network (interface pair). It is highly recommended that
they be configured on separate networks. (Please see ‘What Happens…’ on page 34 for
more information).
Note: all configuration details for NAT have been omitted from this document. Nokia was
completing a hotfix for NATing with multiple internal networks. Please contact Nokia TAC
to determine if this hotfix is now available. For details on how to configure NAT, please
refer to the online documentation. The documentation (nic-doc3.6.tgz) can be
downloaded from https://support.nokia.com.
Diagram 1
64.231.169.249
jeff
10.250.135.11/24 Intel FW / VPN GW
FTP Client 10.250.135.99/24
FTP FTP
10/100 Hub
VRID
10.250.135.1/24 10.250.135.3/24 10.250.135.2/24
dm1
VRID dm2
- NG enforcement 172.16.31.1/24 172.16.31.2/24
172.16.31.3/24 - NG enforcement
module 10.1.1.1/24 10.1.1.2/24 module
10/100 Hub
VRID W2KPro SP 2
blanc 10.1.1.3/24 10.1.1.11/24
Win2KPro SP2
FTP Server
NG Management Console & GUI Client
10.1.1.11 / 24
We will ensure that we are running IPSO 3.6 or later using a shell connection or
Voyager.
Ensure that IPSO 3.6 or a later version that supports CP NG FP3 is installed. It can be
downloaded from http://support.nokia.com. A support account is required. Please refer to
the release notes for installation instructions.
dm1 Setup
10.1.1.1/24 int1 (eth-s1p1) - internal segment
172.16.31.1/24 ss1 (eth-s2p1) - secure segment (a.k.a. DMZ)
10.250.135.1/24 ext1 (eth-s3p1) - external or Internet segment
Note that 24 bits are the net mask or class ‘C’. This is equivalent to 255.255.255.0. A
class B net mask is 16 bits or 255.255.0.0. If one is having difficulty, click on the
following link to download and then install a subnet calculator.
http://www.digitalmigrations.com/IPSub2.exe
Interfaces
Gotcha(s)
If one does not follow this advice, the following error may occur:
Jun 26 10:38:11 dm2 [LOG_CRIT] kernel: FW-1: fw_attach: cannot find name of
interface 3
Ideally, all of the interfaces of the firewalls should have unique names. If any or all of the
interfaces are named the same on one or both gateways, it is undefined which interfaces
would be returned in a name lookup. This can make policy pushing and logging difficult
(read: it won’t work!) or logs confusing to read.
Time is used for both NG state table synchronization and encryption. Devices should be
set to the second. To ensure this it is strongly recommended (read imperative) to use
NTP. This will guard against click drift and ensure valid synchronization of the NG state
table, and thus proper failover.
Time and NG
Gotcha(s)
1 Clocks on the Nokia GWs must be set to within the second in order to ensure
successful NG state synchronization. It is recommended that one perform the following:
a set the proper timezone under ‘Local Time Setup’. In this instance it will be set to
Canada/Quebec/Montreal. (The reason for the odd timezone locations is a holdover from
BSD).
b configure NTP.
NTP
To setup NTP go to the NTP page in Voyager and enable it. There are several ways to
configure NTP. Should one’s corporate security policy disallow outgoing NTP from the
Nokia GWs, one GW can be configured as NTP master using its own clock and the other
GW can point to it. The drawback to this that should a GW ever be breached, a company
will need valid timestamps on logs in a court of law.
tick.utoronto.ca 128.100.103.17
tock.utoronto.ca 128.100.100.128
We will make sure that we have applied, and saved the changes. Then reboot the device
by typing ‘reboot’ at the prompt.
dm1[admin]#reboot
To ensure that the NTP cron is operating one can type the following:
The output marked in bold confirms that the time update occurred.
Interfaces
Hosts
Static Routes
Management Server
At this point we will want to ensure our static routing is properly configured. In this
configuration, the default route on the Management Server will be the internal cluster IP
address or 10.1.1.103.
Gotcha(s)
1 Should we ever want to perform NAT (hide), a route is required to send any
policy traffic destined to the external interface of dm2 through dm2’s internal interface. In
short, add the following two routes to the Management Server.
If the second route is not added the traffic will flow from to the default gateway of the
Management Server (the internal cluster IP). From there is will travel to the cluster
Master and out the cluster master’s external interface, and will be denied at the external
interface of the cluster slave. (See the NAT section for more information)
Both of the enforcement modules have the same simple routing configuration.
N.B. In this configuration we will not be performing NAT. Should you wish to after
you have completed this exercise, you will need to return to this page and
change the ‘Hash Selection’ from ‘default’ to either NAT-INT for internal
interfaces or NAT-EXT hash for external interfaces.
Gotcha(s)
The transfer of the FP3 tar file (CP_FP3_IPSO.tgz) will take place during the running of
the newpkg script using FTP. We will run the following command:
dm1[admin]# newpkg -i
Choose (1-4): 1
Installing CP_FP3_IPSO.tgz
CP_FP3_IPSO does not exist previously. Proceeding with Installation.
PKG_INSTALL:
****************************************************************
PKG_INSTALL: Running /tmp/pkg/CP_FP3_IPSO/CPdtps-50/POST_INSTALL
****************************************************************
*******************INSTALL/UPGRADE PROCESS COMPLETED************
Logout and log back in via the console. You can also go into Voyager and click on
‘Manage Packages’. The installed packages should look like this.
We are now ready to run cpconfig, but before we do, we will install FP3 on the Windows
management station. This will be demonstrated using a series of ‘selected’ self-
explanatory screen shots.
Gotcha(s)
dm1[admin]# cpconfig
Would you like to install a Check Point clustering product (CPHA, CPLS
or State Synchronization)? (y/n) [n] ? y
IP forwarding disabled
Hardening OS Security: IP forwarding will be disabled during boot.
Generating default filter
Default Filter installed
Hardening OS Security: Default Filter will be applied during boot.
This program will guide you through several steps where you
will define your Check Point products configuration.
At any later time, you can reconfigure these parameters by
running cpconfig
Configuring Licenses...
=======================
Host Expiration Signature
Features
Please keep typing until you hear the beep and the bar is full.
[....................]
Thank you.
initial_module:
Compiled OK.
dm1[admin]#
Gotchas
The default policy filter in NG has now changed to drop all except internal
communications. Although unnecessary under most conditions, to ease any issues we will
uninstall the default filter on both GWs during this phase of the configuration.
dm1[admin]# fw unloadlocal
Create a workstation object defining the Management Server (dm). When we are finished
and click ‘OK’ it will automatically generate a certificate to be used for secure
communications (SIC). This replaces ‘putkeys’ which in the past were occasionally
problematic.
Management Server
Gotcha
If you receive the error after you have
typed in your one time password, “Error:
Failed to connect the module" perform
the following:
1. At the NG Module machine, use
cpconfig to re-initialize SIC by typing a
new one-time password (OTP)
2 In the Policy Editor, in the
Communications window of the Module
object, reset SIC communication
3 Reinitialize SIC communication by
typing the same OTP as used at the
Module machine
4 Verify SIC Communication, by pressing
"Test SIC Status".
N.B. Perform the same procedures for the second enforcement point (dm2).
Once that has been successfully achieved, from the file menu click ‘Manage’ and then
select ‘Network Objects’. Click ‘New’ and select ‘Gateway Cluster’. Define the gateway
cluster using the external cluster address i.e. 10.250.135.103.
It is important to know that the individual GW objects will disappear after the cluster
object is created. They can be accessed through the ‘cluster members’ tab of the cluster
object should you need to edit them.
New to FP3 is the ‘Availability Mode’ on the third tab. Be sure to specify ‘Load Sharing’.
SIC is used for authentication and as we will see below. State table information is
exchanged via a network broadcast with the state table in the NG module itself.
In order for clustering synchronization to occur, we must add a rule that allows for cluster
sync (or xpand) traffic. This has changed in FP3. You will need to create two ‘Service’
objects. The first is for TCP port 11003. These two ports are used to pass cluster sync
traffic over.
Now create a ‘Services’ group and place both object into that group.
N.B. State synchronization will occur via SIC and as a result, an implicit rule is not
required.
We can now add the rest of our rules, create, and push a simple policy rule base like the
one below.
dm1[admin]# fw stat
HOST POLICY DATE
localhost defaultfilter 23Feb2002 9:46:22 : [>eth-s5p1c0] [<eth-s5p1c0] [>eth-s4p1c0]
[>eth-s3p1c0]
dm1[admin]# fw unloadlocal
The gateway cluster uses a keepalive mechanism to monitor the health of all its
members. The master member sends multicast keepalive messages to the other cluster
members. The members send keepalive messages to the master member. If a gateway
should become unavailable for any reason, planned or unplanned, TCP / UDP sessions
and IPSec SA sets are assigned to other gateways immediately. The keepalive
mechanism is a proactive method for allowing members to confirm presence of other
gateways and have work allocated to the other gateways when needed.
Dynamic Load Balancing is implemented using the same process described above.
However, the IPSO implementation has added an aging algorithm to the process to
assist load re-balancing.
Clustering Protocols
Nokia and Check Point use the following protocols for clustering:
Many thanks to Morten Bonde (Nokia, S.E., Copenhagen) for providing insight on this.
Here is how Morten reads the traffic.
The cluster ID is ‘2’. The current master of the Cluster with the ID 2 multicasts
KEEPALIVE packets to the multicast address 224.0.1.144 every 200 - 250 msec. The
master sends this traffic out to announce that there already exist a master for cluster
number 2 - even though it knows that the only node in the cluster is the master itself.
The master responds directly back to the prospect's unicast IP-address with a
JOIN_NACK, which basically means "Hang on ... I'll prepare the cluster for accepting a
new member". The master then sends a KEEPALIVE packet to the existing members of
the cluster, informing them that the number of nodes in the cluster has increased
(indicated by the "Members=2").
The master now sends a BUCKET_ASSIGN packet to the prospect's unicast address
saying that whenever the hashing function returns "59" that means that the new node
(as soon as it has become a full member of the cluster) should handle that traffic. At
some point the master assigns more work with a BUCKET_ASSIGN unicast packet,
which is acknowledged by the member with a BUCKET_ASSIGN_ACK multicast packet.
09:42:05.577419 O 172.16.31.102 > 172.16.31.101: "2" CLIENT_KEEPALIVE Member=2, Seq=751
[tos 0xc0] (ttl 255, id 3475)
09:42:06.499296 I 172.16.31.101 > 172.16.31.102: "2" BUCKET_ASSIGN Bucket 59 -> Node 2
[tos 0xc0] (ttl 255, id 16850)
09:42:06.499390 O 172.16.31.102 > 224.0.1.144: "2" BUCKET_ASSIGN_ACK (Node 2) Bucket 59
[tos 0xc0] (ttl 255, id 3481)
09:42:07.275009 I 172.16.31.101 > 172.16.31.102: "2" BUCKET_ASSIGN Bucket 97 -> Node 2
[tos 0xc0] (ttl 255, id 16851)
An example of this follows. Here you can see update traffic on port 11003 being
received.
dm2[admin]# tcpdump -i eth-s2p1c0
tcpdump: listening on eth-s2p1c0
23:37:05.605716 O 0:a0:8e:10:5c:69 0:a0:8e:20:1e:14 0800 82: 172.16.31.3.10004 >
172.16.31.2.11003: S 810354388:810354388(0) win 65535 <mss 512,nop,wscale
1,nop,nop,timestamp[|tcp]>
Troubleshooting the cluster can be done through Voyager. A functioning cluster should
look like these when you go to ‘Monitor’ and ‘Cluster’ in Voyager. The page will refresh
every 30 seconds.
dm1
dm2
If we are testing load balancing and failover with an FTP session such as getting a file
from an internal FTP server, and the session is stopping, but not resuming once the
cluster slave takes over in a failover, ensure that the state tables on the master and
slave are synchronizing by using the following command: fw tab -t connections –s
Wait a moment and perform the following operation. They should have the same number
of connections indicated by #VALS, if the state is being synchronized.
Here is the output from two state tables. The command ‘fw tab -t connections’ was
performed after logging into the FTP server, but before the data transfer was initiated.
Although in hex, we see the FTP connections, which I have underlined as 00000015.
Here is the full dump. The FTP connection (port 21) is 00000015 while the FTP data
connection is 00000014.
Useful Commands
2 A tcpdump on the cluster sync interface will display the cluster sync information.
This is an example of the TCP 11003 and 11004 services that we created earlier.
3 The ‘clish’ and then ‘show clusters’ commands are useful in determining cluster
status.
dm1[admin]# clish
Nokia> show clusters
CID 1
Cluster State up
Member ID 2
Protocol State member
System Uptime At Join 0:00:03:32
Performance Rating 85
Failure Interval 4000
Cold Start Delay 30
Member(s) information
Number of Member(s) 2
4 Check Point NG’s Smart View Tracker is also useful in monitoring the state
synchronization.
01:50:5A:00:<X>:<Y>
...where X and Y are the last 2 numbers in the IP address of the cluster inside interface
(in hex), so:
10.0.32.4 is 01.50.5A.00.20.04
If you are using three octets, i.e. 10.1.32.4, then you would use the last 3 numbers in the
IP address.
If you have a 2 node cluster, with the inside interfaces attached to Catalyst ports 10 and
11, where 10 and 11 are in vlan3, then to set the static CAM entry for those ports, issue
the command:
Cisco routers do not automatically discover multicast addresses. In order for the router
to see it, you need to create a static ARP entry on the router for the interface on the
same LAN.
If you have a cluster with the cluster address of 10.0.32.4, then on the router, issue the
command:
Note, in Cisco terminology, 3/10 and 3/11 indicate module 3 ports 10 and 11, not the
VLAN number 3 ports 10 and 11. So, in a smaller chassis, such as the 1900, 2900, and
3500, the module number is always 0, because these are not modular switches.
Consider if you will that cluster configuration has three nodes. There may be several
internal networks that these nodes are passing traffic for. The each node of the cluster is
then plugged into a single switch that leads to the Internet.
Should the sole Internet switch die, you will experience catastrophic failure of the cluster.
This is a result of the cluster using multicasts for passing traffic. In the next release, IP
forwarding (although slower in terms of performance) will overcome this.
It is possible to overcome this, and since we are implementing a cluster design for high
availability, one would use multiple switches and trunk the switches.