kISO8859-1  . %  ) I f    '  ,Gf (83N/6U m?e_?C7-{%BD9W: ? A =N 6 JJIYBGA.HpBEABFPL<i('JQB% %!"#%$H4%H}(E)N *)[+(,F-1.J'/Fr031D2<234o4A5B62 )79 \89 9: :?! ;A!K<B!=G!>A"?H"Z@B"AE"BA#,CF#nD3#E+#F;$G/$QH0$I3$J $K2%L.%:M3%iN3%O-%P2%Q8&2R8&kS;&TK&U&',V#'SW2'wX='Y7'Z,( [[(M\A(]D(^.)0_2)_`9)a$)f(')"***<#*g***+* ;+ F+@ H+ L+ M,Z,k,,-,--J-j -'-#-5-V.* .4.../!/ />!@/X"?/#/$70%%0&'1'F1E(01g:1=1726h2n 3, 38 3B3O3_ 3o 3 3 3 3 33d3e3f3g3j&4k&4)n4Pq4er)4s%4t4u04v5w 54x5Bz05X|5}D5~5#566<#6Q6u06.6&677/7?7Y7qJ7$77888&=8_@8+:`; ;kB;>;</?c?'@`-@ @<CpD#DrID<DEH>H2\Hq8H:I[IBPLL#M#M"MMHR@R"S*ASMSZW(WY MYYj<^^._;_`Hc;ic"Kc#Jd:$d%mg+&g,(h&-HhO.h/:ll0Wl6l7Dm8lmY9'o:no@>p]A'pBpC/rDrJ?stKfsLKtMNvgNvTQwJUBwVwW<zXez^"{J_F{m`{ab^XhiLjwkPl*r+s=?th}uXv@?|}R~jb=,*(-= AH)b3)-?XO_x-(&VA},@3m1%^bXG+*/3Zd#Bfk'4:(bo)M* + ,0-G.7>/v091>S2n34F5+6d=7s8+9@B:;-<i2i)&j5P<A6=<fz6#k <J'c(/$o)No event name is supplied. Event name doesn't match the real one in ODM. The event name is duplicated in ODM. No event command is specified. The argument(s) is invalid. There is an ODM internal error. Memory allocation error. Folk fails. The clev_errno is unknown! Invalid nodename specified. %s is not a recognized field of event. Event nameEvent descriptionNLS description set numberNLS description message numberNLS description catalog fileEvent commandEvent notify commandPre event commandPost event commandEvent recovery commandRecovery countUsage: %s [-i nodeName] -e eventname -d description [-f NLSCatalogfile] [-m CatalogMessageNo] [-t CatalogSetNo] -s script [-n script] [-b script] [-a script] [-r script] [-c count] [-o odmdir] Must specify event name. Must specify event command. Must specify event default description. Recovery count should be greater than or equal to zero. Usage: %s [-i nodeName] [-e new-eventname] [-d description] [-f NLSCatalogfile] [-m CatalogMessageNo] [-t CatalogSetNo] [-s event command script] [-n notify script] [-b pre event script] [-a post event script] [-r recovery script] [-c recovery count] [-o odmdir] Eventname Or Usage: %s [-i nodeName] [-O Eventname] [-e new-eventname] [-d description] [-f NLSCatalogfile] [-m CatalogMessageNo] [-t CatalogSetNo] [-s event command script] [-n notify script] [-b pre event script] [-a post event script] [-r recovery script] [-c recovery count] [-o odmdir] Arguments cannot be specified for event names. Path information cannot be specified for event names. A recovery counter greater than zero cannot be specified without a recovery command. Warning: recovery command and recovery counter are no longer supported. The existing values will be ignored. Node up description.Node down description.Node up complete description.Script run to swap IP Addresses between two network interfaces.Script run after the swap_adapter script has successfully completedScript run after a network has become active.Script run when a network has failed.Script run after the network_up script has successfully completed.Script run after the network_down script has successfully completed.Script run when a node is attempting to join the cluster.Script run when a node is attempting to leave the cluster.Script run after the node_up script has successfully completed.Script run after the node_down script has successfully completed.Script run after a communication interface has become active.Script run after a communication interface has failed.Script run to configure a communication interface with a service IP label.Script run to configure a communication interface with a service IP label.Script run to acquire disks, varyon volume groups, and mount filesystems.Script run when it is the local node which is leaving the cluster.Script run after the node_down_local script has successfully completed.Script run when it is a remote node which is leaving the cluster.Script run after the node_down_remote script has successfully completed.Script run when it is the local node which is joining the cluster.Script run after the node_up_local script has successfully completed.Script run when it is a remote node which is joining the cluster.Script run after the node_up_remote script has successfully completed.Script run to configure the boot network address on the communication interface.Script run to configure a boot network address on a communication interface.Script run to unmount filesystems and varyoff volume groups.Script run to start application servers.Script run to stop application servers.Script run when the Cluster Manger has been in configuration for too long.Script run when a previously executed script has failed to complete successfully.Topology reconfiguration is starting.Topology reconfiguration is complete.Release old resources.Acquire new resources.Resource reconfiguration is complete.Script run when migration from PowerHA SystemMirror Classic is starting.Script run when migration from PowerHA SystemMirror Classic is complete.Script run to move Connections network protocols to service adapters.Script run to swap Connections network protocols between two network adapters.Script run to start Connections services.Script run to stop Connections services.Script to clean up failed application server before attempting restartScript to complete restarting application server.Script to disconnect clients from a resource group which is going offline.Script to run on a server, to take one of its resource groups offline.Script to complete taking a resource group offline.Script to connect clients to a resource group which has come online.Script to run on a server, to bring a resource group online.Script to complete bringing a resource group online.Script to signal the beginning of an application server shutdown.Script to signal the completion of an application server shutdown.Script to release a resource group during rg_move.Script to signal the completion of a resource group move.Script run when a site is attempting to join the cluster.Script run when a site is attempting to leave the cluster.Script run after the site_up script has successfully completed.Script run after the site_down script has successfully completed.Script run when it is the local site which is leaving the cluster.Script run after the site_down_local script has successfully completed.Script run when it is a remote site which is leaving the cluster.Script run after the site_down_remote script has successfully completed.Script run when it is the local site which is joining the cluster.Script run after the site_up_local script has successfully completed.Script run when it is a remote site which is joining the cluster.Script run after the site_up_remote script has successfully completed.Script run when first geo_primary network recovers.Script run when after site_merge completes.Script run when all geo_primary networks at a site go down.Script run when after site_isolation completes.Script run after a network interface has failed.Script run after a network interface has recovered.Script to move a resource group.Script to acquire a resource group during rg_move.Script to process disk fencing during rg_move.Script to process disk fencing during DARE release.Script to process disk fencing during DARE acquire.Script to process cluster notification event.Add resources to the PowerHA SystemMirror cluster.Removes resources from the PowerHA SystemMirror cluster.Activates resources in the PowerHA SystemMirror cluster.De-activates resources in the PowerHA SystemMirror cluster.Changes the configuration of resources in the PowerHA SystemMirror cluster.Trigger event to move resource groups.End of resource state change event.Event for user-requested resource group migration.Completion event for user-requested resource group migration.Script to signal that intersite fallover was prevented.Script to signal the end of reconfiguration.Script run when at least one of the nodes in the Cluster has been forced down for too long.Script to prompt the operator for manual choice on split or mergeScript to alert the operator of repository disk failure and recoveryScript run after a site_merge event completes.Script run after a site_isolation event completes.Script run when a network is continuously changing state.Script run when a network is stable.emul_cc_init failed to set ODM path %s read_cluster_configuration failed emul_cc_init failed to reset ODM path %s cl_emulate:odm_initialize() failed %s Invalid network name %s Invalid node name Invalid interface/label name Node Up: cl_emulate -e node_up -n nodename Node Down: cl_emulate -e node_down -n nodename { -f|-g|-t} Network Up: cl_emulate -e network_up -w networkname [-n nodename] Network Down: cl_emulate -e network_down -w networkname [- n nodename] Join standby adapter: cl_emulate -e join_standby -n nodename -a ip_label Fail Standby adapter: cl_emulate -e fail_standby -n nodename -a ip_label Swap Adapter: cl_emulate -e swap_adapter -n nodename -w network -a ip_label -d ip_label Cannot read local nodename No active nodes in the cluster Error in opening the pipe Error obtaining active nodes in the cluster Error in executing getversions Error in executing getfiles Error in cluster configuration Error: Bad Node ID [%ld] for node [%s] Error reading status from clsmuxpd Warning: Clinfo must be running for Event Emulation. The local node is undefined: Check for a configuration error or an inactive interface cl_emul:odm_initialize() failed Permission denied, must be root to run this command Error in cc_init Error in odm_terminate %s :Error running script Error opening configuration file Error getting cluster id ******************START OF EMULATION FOR NODE %s*************** ***************END OF EMULATION FOR NODE %s******************* WARNING Running Emulation will change your default configuration on all the nodes of your cluster. Please insure you have a snapshot of your current cluster before you proceed. See documentation for more details on recovery .Event Emulation will take a few minutes .. Please wait Event Emulation output will be in %s HACMPlogs ODM is missing or corrupted. The cluster log entry for %s could not be found in the HACMPlogs ODM. Defaulting to log directory %s for log file %s. NODE BY NODE WARNING: Event %s has been suppressed by %s. Usage: %s [-i nodeName] [-o odmdir] eventname [argument ...] Usage: %s [-i nodeName] [-o odmdir] [-a] eventname ... SystemMirror monitors the cluster for errors and recovers from those errors by running a series of event scripts. To see information about a specific event script, select it from the list. Event name:Synopsis:Description:Probable cause:Related topics:Recommended actions:Category:WarningCriticalErrorInformationalDebugevent preambleresource groupsservice IP addressespersistent addressesoptions when starting cluster servicesoptions when stopping cluster servicesreplicated resourcesconfiguring pre and post eventsproblem determination using event summarykeeping applications highly availableapplication monitoringwriting effective application controller scriptsthe hacmp.out log filesmart assistsproblem determinationRecover From PowerHA SystemMirror Script Failurecustomizing cluster eventsmaking configuration changes when cluster services are active (DARE)configuring sitesconfiguring split and merge optionsnetwork interface monitoringcluster verificationservice and persistent IP addressesmanaging resource groupsusing SNMP traps to monitor for storage failuresCustomizing Inter-Site Resource Group Recoverystarting and stopping cluster servicesdefining custom resourcessplit and mergeconfiguring a tie breakerCluster Aware AIX (CAA)repository disksconfiguring location dependencies between service and persistent addressesconfiguring the "first alias" optionglobal network eventsnode_up and node_up_complete Script run when cluster services are started on a node. Script run after a node_up event has successfully completed. These events occur when cluster services are started on a node and initiates the process of managing any resources and resource groups on the node. The node that is starting is used as a parameter to the event. The hacmp.out log file will contain an 'Event preamble' which indicates which resource groups will be affected by this event. After the node_up event, there may be a series of rg_move events which manipulate the individual resource groups. The node_up_complete event indicates the conclusion of the resource group actions taken for the corresponding node_up event. cluster services were started on the node. review the resource group state and ensure all resources and applications are working properly. node_down and node_down_completeScript run when cluster services are stopped or a node has failed.Script run after a node_down event has successfully completed.These events occur when a node is no longer running cluster services or is in the process of stopping cluster services. If these events occur because a node has failed, the remaining, active nodes will initiate takeover of any resources and resource groups, according to the policies for those groups. The node that is being processed is passed as a parameter to the event. If you stop cluster services manually, the local node participates in the events and will indicate the options you specified, for example, stopping cluster services with the "unmanage resource groups" option. The hacmp.out log file will contain an "Event preamble" which indicates which resource groups will be affected by this event. After the node_down event, there may be a series of rg_move events which manipulate the individual resource groups. The node_down_complete event indicates the conclusion of the resource group actions taken for the corresponding node_down event. cluster services were stopped or a node failed.identify the cause of the event and make sure all resources and applications are working properly. site_down and site_down_complete eventsScript run when all nodes in a site are down.A site is a collection of one or more nodes. If cluster services are stopped on all nodes or all nodes fail in a site, a site_down event is run in addition to individual node_down events. Site events occur before the regular node_down events and enable special processing for replicated resource types that require actions at the site level. Additional site_down_local and site_down_remote events are run, depending on whether the local node is a member of the site going down or a remote site. The site that has failed is passed as a parameter to the event. The site_down_complete event will also run site_down_remote_complete or site_down_local_complete. These events take no specific actions and are intended as mechanisms for configuring notifications or pre and post events. cluster services were stopped or all nodes failed in a site.identify why all nodes in the site are down and make sure all resources and applications are working properly. site_up and site_up_complete eventsScript run when cluster services are started on the first node in a site.Script run after a site_up event has successfully completed.A site is a collection of one or more nodes. When cluster services are started for the first time on any node in the site, a site_up event is run in addition to the node_up event. Site events occur before the regular node_up events and enable special processing for replicated resource types that require actions at the site level. Additional site_up_local and site_up_remote events are run, depending on whether the local node is a member of the joining site or a remote site. The site that has joined is passed as a parameter to the event. The site_up_complete event will also run site_up_remote_complete or site_up_local_complete. These events take no specific actions and are intended as mechanisms for configuring notifications or pre and post events. cluster services were startedmake sure all resources and applications are working properly.acquire_service_addr, acquire_takeover_addr, release_service_addr and release_takeover_addrScript run when a service address is acquired on a node.Script run when a service address is released from a node.These events are called from higher level events like node_up or rg_move and are the first step in acquiring or releasing a service IP address on a node. The upper level events will determine what led up to these actions. Events like these are described only for debugging purposes: notifications and other event customizations are generally configured for the higher level events. If you follow the event execution in the hacmp.out log file you will see these events make some basic determinations about which service IP addresses will be acquired or released, then call helper functions to perform the actual commands. If any failures occur, the errors will be logged by the lower level functions, but these events are a good starting point when analyzing the log files. The service address being acquired or released is passed as an argument to the event. a node is acquiring or releasing a service IP address as part of a cluster eventif a failure has occurred while acquiring or releasing a service IP address, use the reference strings in the event summary to locate the problem in the log file start_server and stop_server eventsScript run to start an application.Script run to stop an application.These events are called from higher level events like node_up or rg_move and are used to start or stop an application. The upper level events will determine what led up to these actions. Applications are managed using the scripts you defined to the application controller. When a resource group is acquired on a node, the application start script is run to start the application and when a resource group is being released, the application stop script is run to stop the application. By default both the start and stop scripts are run as background processes such that any delay or hang does not fail the execution of the higher level event. If your application startup process requires that the start script complete execution before proceeding with the rest of the bring-up process, you can change the start script execution mode to run in the foreground instead. By default, the stdout and stderr from the scripts you provided will be logged into the hacmp.out file. If your start and stop scripts are complex, you may want to consider alternate approaches to logging. If you have defined an application monitor, the monitor will be used to determine the state of the application. a node is starting or stopping an application as part of a cluster eventverify that the application has been properly started or stoppedconfig_too_long notification eventScript run when a cluster event has been running for a long time.SystemMirror uses event scripts and utilities to manage the cluster resources and keep them highly available. These scripts can call underlying AIX commands (e.g. lsvg) or perform remote operations (e.g. sending commands to an HMC) which may hang or take a long time to complete. There is no programmatic way to monitor the actual progress of such commands so SystemMirror sets a timer at the beginning of every cluster event, and if the event execution time exceeds the timer setting, the config_too_long event is run to alert the administrator to check the cluster and see if manual intervention is required. The config_too_long event may also be run if there is an event script failure, which will also require manual intervention to recover. By default, the timer is set to 6 minutes. You can change this value if the config_too_long event occurs frequently during normal execution of cluster events. The parameters to the event are the timer value and the name of the event that has been running too long. a cluster event has been running for more than 6 minutes and may need manual intervention examine the hacmp.out log file to see if there is an event failure or if a command is taking too long to complete. If there is an event failure you will need to correct the cause of the failure then resume event processing using the Recover From PowerHA SystemMirror Script Failure SMIT option. event_error eventScript run when there is a unrecoverable error while running a cluster event.SystemMirror uses event scripts and utilities to manage the cluster resources and keep them highly available. These scripts can call underlying AIX commands (e.g. lsvg) or perform remote operations (e.g. sending commands to an HMC) which may sometimes fail. SystemMirror can recover from certain failures and continue processing, however, other failures are unrecoverable and will stop execution of the cluster event. The hacmp.out log file will show the cause of the event error. The config_too_long event may also be run if the error is not recovered within 6 minutes. Once you determine the cause of the failure you will need to correct the problem, then resume event processing using the Recover From PowerHA SystemMirror Script Failure SMIT option. Keep in mind that the event processing will resume at the next major event step which means that any remaining actions in the failed event \nscript will be skipped. You will need to manually ensure the correct state and operation of the cluster resources after recovering from an event error. It is highly recommended that you configure a notification method for this event as it indicates that no further recovery will occur and manual intervention is required to correct the problem. The parameter to the event is the name of the event that has failed. an unrecoverable error has occurred during a cluster event examine the hacmp.out log file to determine the cause of the error and what recovery actions are needed. Once corrected, you will need to resume event processing using the Recover From PowerHA SystemMirror Script Failure SMIT option. reconfig_topology and reconfig_resource eventsScript run while processing a dynamic configuration change.SystemMirror allows for many configuration changes to be made while cluster services are active. The process of incorporating the changes into the active cluster is known as a Dynamic Active Reconfiguration Event or DARE. Several sub events are used during processing, for example, the reconfig_topology events process changes in the cluster topology (e.g. node add or delete) and the reconfig_resource events process changes to cluster resources and resource groups. The process of updating the active configuration is automatic, however there may be problems encountered while dealing with individual resources, e.g. adding a new volume group to the cluster. Any errors will be logged into the hacmp.out file. The reconfig_configuration_complete event is the last event run in the DARE process. user initiated a configuration change while cluster services are active verify that the changes in configuration have been correctly processed, examine hacmp.out for any errors site_isolation, site_isolation_complete, site_merge and site_merge_completeScripts run when SystemMirror detects a site isolation or merge condition.A site is a collection of one or more nodes. When the nodes at one site lose contact with nodes at another site but all nodes remain active, the condition is referred to as a "split brain" or "site isolation". When communication is restored between sites, the event is referred to as a merge. SystemMirror responds to site isolation and site merge conditions with the corresponding events. While site isolation and possible merge conditions should be avoided, SystemMirror provides options for how to respond when they do occur. Refer to the product documentation for more information on the use of tie breakers and manual recovery options during split and merge conditions. a communication problem occurred between nodes at different sites and the nodes at each site remained active investigate the cause of the network error and respond to any prompts for recovery if you configured the "manual" option for split or merge fail_interface and join_interface eventsScripts run when SystemMirror detects a network interface state change. SystemMirror uses the Cluster Aware AIX infrastructure to monitor etwork interfaces defined to the cluster. If a single interface fails on a node, a fail_interface event is run. When multiple interfaces on the same network fail at the same time, you may see a series of fail_interface events along with a network_down event. If the failed interface was hosting any service or persistent addresses, SystemMirror will attempt to recover them if there are active interfaces elsewhere on the same network. When an interface that was previously down starts working again, a join_interface event will run, and if multiple interfaces start working you may see a series of join_interface events along with a network_up event. If any service or persistent IP addresses were offline because there were no active interfaces capable of hosting them, SystemMirror will recover those addresses on the newly active interface. The name of the interface is passed as an argument to the event. a communication problem occurred with a network interface check the network interface (either real or virtual) and other devices in the network. cluster_notify eventScript run when SystemMirror detects a cluster configuration problemSystemMirror has a built in configuration verification utility which can detect problems with the cluster configuration. The verification process is run each time a configuration change is synchronized across the cluster, and by default will run once every 24 hours afterwards. If the verification process detects a problem during the automatic verification run and there are active cluster nodes, a cluster_notify event is run. The cluster_notify takes no action to correct the problem - it is run so that you can configure event notifications for it which can alert the systems administrator to check on the cluster. cluster verification detected a problemcheck the verification logs for errors. Run verification manually to verify the problems have been corrected. swap_aconn_protocols, get_aconn_rs and release_aconn_rs eventsScripts run to support AIX Fast ConnectAIX Fast Connect is server software that allows AIX servers and workstations to share files and printers with personal computer clients running the Windows operating systems. These utility events are called whenever a service or persistent IP address is manipulated (acquired or released) in order to update the Fact Connect subsystems with the new configuration. If you are not using AIX Fast Connect, these scripts return without taking any actions. a service or persistent IP address has changed if you are not using AIX Fast Connect there is no action required. If you are using AIX Fast Connect, verify that the clients are working properly with the new address configuration. resource_state_change and resource_state_change_complete eventsScripts run when a cluster resource encounters a recoverable error or the user moves resource groups. SystemMirror provides monitoring and recovery of resources in a high availability environment. In addition to the automated recovery you can manage the resources manually to bring them online or offline, or move them between nodes. When SystemMirror detects a recoverable error or when you manage resource groups manually, the resource_state_change script is run. The hacmp.out log file includes an Event Preamble which lists the resource groups that will be acted on as part of the event. There will be a series of rg_move sub-events run as part of the resource_state_change event. a cluster resource failed or the user initiated an action on a resource group if the action was not initiated by the user, review the logs to identify the source of the problem and what actions SystemMirror took to recover external_resource_state_change and external_resource_state_change_complete eventsScripts run when a cluster resource encounters a recoverable errorSystemMirror provides monitoring and recovery of resources in a high availability environment. SystemMirror can also respond to external events and failure indications and perform the same recovery actions. For example, an SNMP trap may indicate the failure of storage replication mechanism, and SystemMirror will respond by attempting the restart the subsystem. When SystemMirror responds to an external error indication, it runs the external_resource_state_change event script. The hacmp.out log file includes an Event Preamble which lists the resource groups that will be acted on as part of the event. There will be a series of rg_move sub-events run as part of the external_resource_state_change event. an external source indicated that a cluster resource failed review the logs to identify the source of the problem and what actions SystemMirror took to recover intersite_fallover_prevented eventScript run when a cluster resource is not moved across a site boundarySystemMirror provides monitoring and recovery of resources in a high availability environment. If you have sites defined, you may want certain types of resources to to be kept highly available within the nodes in a site but never allow them to move to the backup site. For example, if you have a service IP address configured at your primary site location, the networking infrastructure and routing will likely be substantially different at the remote site, such that the same IP address is not usable at the backup site. To prevent the service IP from moving between sites, you can change the Inter-Site Resource Group Recovery policy from "fallover" to "notify". When SystemMirror processes an event that would normally move the group to the backup site, but you have changed the Inter-Site Recovery policy to "notify", SystemMirror will run the intersite_fallover_prevented event instead of a rg_move event. You can customize the intersite_fallover_prevented event to notify the administrator of the action taken. a failure occurred that would normally result in moving a resource group to a different site, but you configured the inter-site recovery policy to be "notify" instead review the logs to identify the source of the problem and what actions are needed to recover forced_down_too_long eventScript run when a cluster node has been in the unmanaged state for some timeSystemMirror allows you to stop cluster services while leaving your applications and other cluster resources active. From the SMIT panel you can change the "Select an Action on Resource Groups" option to "Unmanage Resource Groups", or use the MANAGE option with the clmgr "offline node" command. Using this option leaves the cluster resources active and stops any monitoring or recovery, therefore it is not recommended to leave the node in this state for an extended period of time. If you leave a node in the unmanaged state for more than an hour, the forced_down_too_long event is run to remind you to restart cluster services. cluster services were stopped with the "unmanage" option for more than an hour start cluster services as soon as possiblestart_udresource and stop_udresource eventsScript run when a user defined resource is started or stoppedSystemMirror lets you define your own cluster resources and will manage those resource along with the built-in resource types. When you define a custom resource type, you provide scripts to start, stop and monitor the resource. The start_udresource and stop_udresource utility scripts are called during a larger event like node_up or rg_move and invoke the appropriate user supplied scripts to start or stop the resource. If you have a problem with the scripts you provided, you can search on the start_udresource or stop_udresource tracing in the hacmp.out file to see just where and how your scripts were invoked. you defined a custom resource type which was started or stopped during a cluster event. review the hacmp.out file to ensure your scripts worked properlysplit_merge_prompt eventScript run when a split or merge situation occurs and a user response is requiredSystemMirror supports sites which are a collection of one or more nodes. Site definitions are intended to mirror the physical layout of a cluster used for disaster recovery. When the nodes at one site lose contact with nodes at another site but all nodes remain active, the condition is referred to as a "split brain" or "site isolation". When communication is restored between sites, the event is referred to as a merge. You can customize how SystemMirror responds to site isolation and site merge conditions including an option to have the administrator direct the recovery action for each site. When you have specified a manual option for split or merge, the split_merge_prompt event is run to notify the administrator to take action. you configured a manual response for a split or merge scenario and you now need to provide that response. review the problem that led to the split or merge situation then provide the appropriate response rep_disk_notify eventScript run when a repository disk has failedSystemMirror uses the Cluster Aware AIX (CAA) infrastructure for cluster status information. CAA requires a repository disk connected to all cluster nodes to store its configuration information. When a repository disk fails, the rep_disk_notify event is run. The rep_disk_notify takes no recovery action: it is intended for customization such that the administrator is alerted and can resolve the problem with the disk. CAA has lost access to a repository disk restore access to the disk or replace itswap_adapter and swap_adapter_complete eventsScript run to swap IP addresses between two network adapters.Script run after a swap_adapter event has successfully completed.These events occur when an interface that hosts a service or persistent IP address fails and SystemMirror is able to recover the IP address on another interface on the same node. The swap_adapter event moves the service IP and persistent labels according to any location preferences then reconstructs the routing table. You can also configure the ordering of the alias IP addresses as they are moved between interfaces. The hacmp.out file records the running of the events. You should check the network connections and interface status to determine what caused the initial failure and ensure that the service IP and persistent labels are functioning correctly on the new interface. Arguments to the event are: nodename network ip_address1 ip_address2 ip_address1 - the boot address this script swaps the service IP to. ip_address2 - the failed address The swap_adapter_complete follows the swap_adapter event and takes additional actions to ensure the swap operation is effective. network adapter or other network failure.check the physical and or virtual network adapters.network_up and network_up_complete eventsScript run after a network has become active.Script run after a network_up event has successfully completed.These events occur when one or more interfaces on a network become active. If any service or persistent addresses were offline, SystemMirror attempts to bring those addresses back online, along with any resource groups containing service IP labels on that network. The event has the following format: network_up network_up_complete If the network is coming up after a cluster wide outage, the node name parameter will be "-1" to indicate a global network up. You should check the network connections and interface status to ensure that the network is now working normally. The network_up_complete event follows the network_up event and takes additional actions to re-activate NFS mounts and persistent labels. one or more interfaces on a network became active when it was previously down. check the physical and or virtual network adapters. Check that persistent labels are working as well as any NFS mounts. network_down and network_down_complete eventsScript run after a network has failed.Script run after a network_down event has successfully completed.These events occur when all interfaces on a node and network have failed. The event has the following format: network_down network_down_complete If the network has failed on some nodes but not all nodes in the cluster, there will be a network_down event run for each node. If any service addresses were online, SystemMirror attempts to bring those addresses back online, along with any resource groups containing the service addresses, on a backup node where the network is still active. If the network has failed on all nodes in the cluster, the node name parameter will be "-1" to indicate a global network failure. You should check the network connections and interface status to determine the cause of the failure. The network_down_complete event follows the network_down event and refreshes statd if there are any NFS mounts. one or more interfaces on a network failed. check the physical and or virtual network adapters.server_restart and server_restart_complete eventsScript run to restart an application.SystemMirror monitors the health of applications using application monitors. When you configure an application monitor, you can specify if you want SystemMirror to try to restart the application when the monitor indicates it has failed. You can also specify how many times you want to try to restart the application. SystemMirror uses the server_restart event to run the restart and notification methods you specified for the application monitor. You may see multiple restart attempts if you have specified a restart count for the monitor. The parameter to the event is the name of application controller. an application monitor has indicated an application has failed, and a restart is being attempted. investigate what caused the monitor to indicate the application failed server_down and server_down_complete eventsScript run when an application has failed.SystemMirror monitors the health of applications using application monitors. When you configure an application monitor, you can specify if you want SystemMirror to try to restart the application when the monitor indicates it has failed. You can also specify how many times you want to try to restart the application. If the application cannot be restarted, the server_down and server_down_complete events are run. The server_down and server_down_complete events take no recovery action: they are provided as an indication the application has failed and allow for event customizations like adding a notification method. If you set the application monitor "failure action" to "notify", the server_down events will be the only indication the application failed. If you set the application monitor "failure action" to "failover" (the default) then SystemMirror will attempt to recover the application and resource group on another available node. You may see multiple server_restart attempts prior to this event. The parameter to the event is the name of application controller. an application monitor has indicated an application has failed and no more restart attempts remain. investigate what caused the monitor to indicate the application failed. If you specified a failure action of "notify", you will want to restart the application yourself. rg_move and rg_move_complete eventsScripts run when SystemMirror moves resources and resource groups.When you start cluster services, SystemMirror will activate the resources and resource groups on the active cluster nodes. When a failure occurs, SystemMirror will try to keep the resources highly available by moving them to a backup node. You can also move resource groups between active cluster nodes. SystemMirror uses a series of event scripts during the different phases of acquiring or releasing the resource groups. You will see additional events like rg_move_release, rg_move_acquire and rg_move_fence run to perform specific operations. The rg_move events are triggered by a higher level event like a node_up or node_down. The hacmp.out log file will have an "Event Preamble" which lists the rg_move events that have been queued in response to the failure. a higher level event such as a node failure has triggered the recovery of one or more resource groups if the event was not in response to manually moving the resource group, then investigate the cause of the event that triggered the recovery of the resource groups. Verify the resources have been recovered. Script run when a resource group goes to error stateSystemMirror will try to keep resources active by moving resource groups between active nodes in the cluster. If the acquisition of the group fails on one node, SystemMirror will attempt to acquire it on the next candidate node. If the acquisition fails on all nodes and the resource group cannot be brought online, the resource group will be put into the error state, and cluster services will return to the stable state. This script is run when a resource group goes to error state and is intended to be customized with a notification script such that you can be alerted when a resource group is not online. a resource group has failed and all attempts to recover it have also failed. Examine the hacmp.out log file to find the root cause of the failure and what recovery actions have already been attempted. Once you have corrected the problem, you can bring the resource group online using smit or the clmgr command. WARNING: One or more resource groups are in the ERROR or ERROR_SECONDARY state. Use the clRGinfo or clmgr commands to identify which group(s) need attention. Once you have corrected the problem you can bring the group(s) back online using smit or the clmgr command. Script run to collect first failure data captureWhen PowerHA SystemMirror detects a problem on the system, it is critical to capture information about that problem for later analysis. This script captures log files and other information that can be used for diagnosing problems in the cluster. PowerHA SystemMirror detected a problem on the system. If the problem is reported to IBM software support, the log files collected by this script will automatically be included with the other information sent to IBM. Script run when a network is continuously changing state.When PowerHA SystemMirror detects that one or more interfaces on a network is continuously changing state, the network_unstable event will run instead of continuing to run individual join and fail interface events. The event will continue to run and log messages to the hacmp.out log file until stability is restored. PowerHA SystemMirror is seeing continuous state change notifications for one or more interfaces on a network. Check the interfaces on the network for obvious problems and ensure basic functions like ping and ssh are working. If the problem perists, consult the AIX and PowerHA documentation for the installed software levels for instructions for turning off interface traffic monitoring. If you have a later version of AIX, you may be able to use the following command: clmgr -f modify cluster MONITOR_INTERFACES=disable If the problem continues, contact IBM support for further assistance. Script run when a network that was previously unstable becomes stable.When PowerHA SystemMirror detects that one or more interfaces on a network is continuously changing state, the network_unstable event will run. When these state changes cease, the network_stable event will run. If the network is up at the time, a network_up will also run. PowerHA SystemMirror is no longer receiving continuous state change notifications for the network. Check the interfaces on the network for obvious problems and ensure basic functions like ping and ssh are working. network_unstable and network_stable events Script run when an administrator initiates a cluster operation. The administrator can initiate cluster operations using smit, clmgr, or the command line. For example, the user can use "smitty clstart" to start cluster services, which will initiate a "node_up" cluster event. The "admin_op" event records information about the administrator request. You can configure a notification event to be informed when administrator initiated events are run. Administrator initiated a cluster operation. Verify that the administrator initiated operation was appropriate and that the operation was successful. Cluster event %1$s not found. You may have misspelled the event name or the event may no longer exist. Check the spelling and try again. # SystemMirror uses a series of events # to manage cluster resources and # recover from failures. You can # configure these events to provide # notifications and call other scripts # before and after they run. To learn # about the funciton of an individual event, # select it from the list below. The following administrative operation is not valid. Cluster services are in the wrong state for this operation. Application monitoring is in the wrong state for this operation. The specified Application Monitor could not be found. Another administrative operation is already being processed. An internal error occured while processing the operation. Contact IBM support for further assistance. The specified mode is not one of the supported modes. No events will be run in response. on ALL nodes on the following nodes: and continue current event processing. and cancel all pending events. Set clstrmgr daemon debug level to %1$d Dynamic Automatic Reconfiguration Event (DARE) Administrator applied snapshot %1$s The "admin_op" event that was run in response to applying this snapshot failed. Check "hacmp.out" for details.