This README describes the PowerHA NFS tiebreaker program - nfstb_test If you are familiar with how a tiebreaker works you can skip directly to this section: *** Using the NFS tiebreaker test program *** Cluster partition and tiebreakers In a PowerHA cluster environment, a cluster "partition" or "split brain" scenario can occur when one or more nodes lose all communication with another set of nodes but still retain access to a shared resource like shared storage. When this occurs, both sets of nodes will see the other set of nodes as failed and try to access the shared storage, which can lead to data corruption. This can occur in a cluster where all nodes are in the same data center, or in clusters where the nodes are in remote data centers. If the partition goes undetected, and communication is restored between the sides of the split, there is a "merge" condition where a decision will have to be made as to which set of nodes continues to access the shared resources and which set is to be shutdown. To handle a split or merge scenario you can configure a tiebreaker. When a partitioned cluster is detected, PowerHA will look to the tiebreaker to decide which side of the partition should survive and which side should be shut down. Similarly, when a merge occurs, the tiebreaker will be used to decide which side of the partition is shutdown. PowerHA supports a number of different devices to use as a tiebreaker including a shared disk, NFS, and cloud. To be effective as an arbiter, the tiebreaker should be outside of either data center: any outage that affects the communication between data centers would likely also affect the tiebreaker if the tiebreaker was in the same data center. *** Using a NFS tiebreaker To use NFS as a tiebreaker you must configure a NFS server and export a directory which is accessible from all nodes in the cluster. The exported directory must be readable and writeable such that any node can establish a "lock" that the other nodes can see. The NFS server can be any device that supports NFS version 4. NFS version 4 has superior file locking capabilities which improves the effectiveness of the tiebreaker function, and the NFS server can be implemented on Linux, Windows, Z/os, etc. The first step in implementing a NFS tiebreaker is to configure the NFS server and export directory, as well as the network connectivity to make it accessible to the cluster nodes. *** Testing the NFS tiebreaker Once you have configured the NFS server and export directory, you will need to provide some information for the test program to exercise the tiebreaker function. o NFS tiebreaker and export directory You will need to provide the IP address or host name of the NFS server that will act as the tiebreaker, along with the directory that is exported (read/write) from that server. The test program will use this server and directory to store the lock files used by the tiebreaker function. For testing purposes you can use a cluster node as the NFS server but this is not supported for production use. o local mount point You will need to specify where the cluster nodes will NFS mount the exported directory. This can be any local directory - if it does not exist when you use the test program, the test program will create it for you. o NFS options (optional) The tiebreaker uses hard coded options for the NFS mount. These options are: vers=4,fg,soft,retry=1,timeo=10 You can specify different options for testing purposes. For example if you are having trouble which you suspect may be due to these options. However, they will not be used by the tiebreaker in a production cluster. To use the PowerHA NFS tiebreaker in production you must eventually test using the default options. *** How the NFS tiebreaker test program works In a PowerHA cluster, the RSCT subsystem detects split and merge scenarios. When you configure a tiebreaker in your PowerHA cluster, PowerHA will configure RSCT to invoke a tiebreaker binary supplied by PowerHA. When the RSCT subsystem detects a split or merge, it calls the tiebreaker binary which in turn looks to the NFS server to see if it can establish a lock or if the other side of the partition has already created a lock. The partition that is able to get the lock "wins" and the other side of the partition "loses" and all nodes in that partition are rebooted. The NFS tiebreaker test program exercises the same tiebreaker binary but does not invoke RSCT, create split and merge scenarios, or reboot nodes. When you configure a tiebreaker in PowerHA you will need to supply the same 3 pieces of information that you need to use the test program: the NFS server and export directory, and the local mount point. The test program will use the PowerHA tiebreaker binary to create a NFS lock, then challenge that lock from the remote node. The test program will unmount the server when done - if you exit the program without exercising the test function, you may have to manually cleanup the NFS directory and unmount the server from one or both of the cluster nodes. *** Using the NFS tiebreaker test program The test program must be run from a node in a PowerHA cluster. The node where the test program runs acts as one side of the partition, and you will specify another cluster node which will act as the remote side of the partition when competing for a lock. The test program uses the cl_rsh capabilities of PowerHA to run commands on the other cluster node. Sites are not required but if you have configured sites, the test program will default to using the first node in the remote site as the default remote node. If you have already configured a tiebreaker to PowerHA, the test program will use that configuration as the default for the test program. As noted above, you must have configured a NFS server prior to using the test program - the test program does not help with the NFS server configuration. When you use a tiebreaker in a production cluster, it has to be outside of the cluster, but for testing purposes you can use another cluster node as the NFS server. Since by default the NFS server is outside of the cluster, the test program cannot access the server to check the NFS setup. The test program will look for default values from the cluster definition and will prompt you for any required information. You can also specify the inputs from the command line as follows: -s - NFS server name or IP address -e - directory exported as read/write from the NFS server -l - local mount point -r - remote cluster node to act as the other side of the partition -n - NFS options (testing only) The test program will log the commands it runs and any errors to the /var/hacmp/log/clutils.log file on the local node. You can apply the NFS tiebreaker configuration to the PowerHA cluster once testing is complete.