This README describes the PowerHA cloud based tiebreaker test program - ctb_test If you are familiar with how a tiebreaker works you can skip directly to Using the NFS tiebreaker test program *** Using the cloud tiebreaker test program *** Cluster partition and tiebreakers Cluster partition and tiebreakers In a PowerHA cluster environment a cluster "partition" or "split brain" scenario can occur when one or more nodes lose all communication with another set of nodes but still retain access to a shared resource like shared storage. When this occurs, both sets of nodes will see the other set of nodes as failed and try to access the shared storage, which can lead to data corruption. This can occur in a cluster where all nodes are in the same data center, or in clusters where the nodes are in remote data centers. If the partition goes undetected, and communication is restored between the sides of the split, there is a "merge" condition where a decision will have to be made as to which set of nodes continues to access the shared resources and which set is to be shutdown. To handle a split or merge scenario you can can configure a tiebreaker. When a partitioned cluster is detected, PowerHA will look to the tiebreaker to decide which side of the partition should survive and which side should be shutdown. Similarly, when a merge occurs, the tiebreaker will be used to decide which side of the partition is shutdown. PowerHA supports a number of different devices to use as a tiebreaker including a shared disk, nfs, and cloud. To be effective as an arbiter, the tiebreaker should be outside of either data center: any outage that affects the communication between data centers would likely also affect the tiebreaker if the tiebreaker was in the same data center. *** Using a cloud based tiebreaker To use a cloud service as a tiebreaker you must configure the network connectivity and purchase cloud resources from your cloud provider. The cloud service must be accessible from all nodes in the cluster. You will need access to a single bucket in the cloud which will store the lock file which is used as the arbiter to decide which side of the partition was granted the lock, and which side "lost". The cloud service can be AWS or IBM. The bucket name can be any valid bucket name and you can specify if the test program should look for an existing bucket or create a new one. *** Testing the cloud based tiebreaker Once you have configured the cloud service and bucket, you will need to provide the following information for the test program to exercise the tiebreaker function: o cloud service You will need to provide the cloud service name - either "aws" or "ibm". The test program will use this service and the specified bucket to store files used by the tiebreaker function. o bucket name Specify the bucket in the cloud. If the bucket does not exist it will be created. o use existing bucket option (optional) If you do not want the tiebreaker to create a bucket, you can specify this option. If you specify this option and the bucket does not exist, the tiebreaker will return an error. *** How the cloud based tiebreaker test program works In a PowerHA cluster, the RSCT subsystem detects split and merge scenarios. When you configure a tiebreaker in your PowerHA cluster, PowerHA will configure RSCT to invoke the tiebreaker binary supplied by PowerHA. When the RSCT subsystem detects a split or merge, it calls the tiebreaker binary which in turn looks to the bucket in the cloud to see if it can establish a lock or if the other side of the partition has already created a lock. The partition that is able to get the lock "wins" and the other side of the partition "loses" and all nodes in that partition are rebooted. The cloud based tiebreaker test program exercises the same tiebreaker binary but does not invoke RSCT, create split and merge scenarios, or reboot nodes. When you configure a tiebreaker in PowerHA you will need to supply the same 3 pieces of information that you need to use the test program: the cloud service name (aws or ibm), bucket name, and the option to only use an existing bucket. The test program will use the PowerHA tiebreaker binary to create a lock, then challenge that lock from the remote node. The test program will remove any files from the bucket when done - if you exit the program without exercising the test function, you may have to manually cleanup the bucket. *** Using the cloud based tiebreaker test program The test program must be run from a node in a PowerHA cluster. The node where the test program runs acts as one side of the partition, and you will specify another cluster node which will act as the remote side of the partition when competing for a lock. The test program uses the cl_rsh capabilities of PowerHA to run commands on the other cluster node. Sites are not required but if you have configured sites, the test program will default to using the first node in the remote site as the default remote node. If you have already configured a cloud tiebreaker to PowerHA, the test program will prompt you to use that configuration as the default for the test program. As noted above, you must have configured the cloud service and bucket before using the test program - the test program does not help with the cloud configuration. The test program will look for default values from the cluster definition and will prompt you for any required information. You can also specify the inputs from the command line as follows: -s - service name - "aws" or "ibm" -b - bucket name -u - use existing bucket - if specified, the tiebreaker will fail if the bucket does not exist The test program will log the commands it runs and any errors to the /var/hacmp/log/clutils.log file on the local node. You can apply the cloud based tiebreaker configuration to the PowerHA cluster once testing is complete.