# IBM_PROLOG_BEGIN_TAG 
# This is an automatically generated prolog. 
#  
#  
#  
# Licensed Materials - Property of IBM 
#  
# (C) COPYRIGHT International Business Machines Corp. 1999,2019 
# All Rights Reserved 
#  
# US Government Users Restricted Rights - Use, duplication or 
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 
#  
# IBM_PROLOG_END_TAG 
 
 
 
************************************************************************
 
                                rsct.basic
 
************************************************************************
@(#)99   1.12.4.1   src/rsct/rsct.basic.READMEsrc, availability, rsct_rady, rady2035a 2/18/04 10:40:46


 
DESCRIPTION

The rsct.basic file sets include availability infrastructure provided by
RSCT.  The infrastructure includes Group Services and Topology Services.

This infrastructure is documented in the RSCT Administration Guide.

 
INSTALLATION INFORMATION

 
ADVISORIES

RSCT Peer Domain Functional Level and PTF Rejection:

  RSCT Peer Domains (RPDs) operate at a functional level that
  corresponds to the lowest fileset level, as specified by the fileset's
  VRMF value, installed on the nodes of the domain. Once all the nodes
  in the RPD are migrated to a newer level, the RPD can start operating
  at the newer level via the runact command, as described in the RSCT
  Administration Guide.

  If an RPD is created after all nodes in it are running with RSCT
  2.3.2.0 or higher then this will be the level at which the RPD will
  operate. If the PTF that brought RSCT to this level is rejected in any
  of the nodes (bringing the VRMF below 2.3.2.0) in the RPD then that
  RPD will be unable to come online, since the level of the RPD will be
  higher than the level with which the nodes are able to operate. In
  this case, the options are either to upgrade the node back to RSCT
  2.3.2 or to remove the RPD.

RSCT Peer Domain Coexistence:

  Although it was possible to create a RSCT Peer Domain with RSCT
  2.2.1.10, it was not officially supported.  Therefore, nodes with RSCT
  2.2.1.10 installed should not be added to a Peer Domain created with
  RSCT 2.2.1.20 or beyond.

Removing and Adding a Node Back into an RSCT Peer Domain:

  When a node is removed from a Peer Domain via the rmrpnode command,
  there must be at least a two minute delay before adding the same node
  back into the same Peer Domain via the addrpnode command.

DOCUMENTATION UPDATES AND INFORMATION

User Authentication Policy to use Group Services (cthags)

  Group Services daemon is running as root. However, the applications
  can use Group Services if one of the following conditions are met.

    a) If the user group 'hagsuser' exists in /etc/group at 
       the startup time of the hagsd daemon, applications
       with an effective user id (euid) of root, or
       an effective group id (egid) of hagsuser, or
       an effective user id (euid) of one of supplementary 
       members of the hagsuser group will be allowed.
    
    b) If the user group 'hagsuser' does not exist in /etc/group, only
       applications with an effective user id (euid) of root will be
       allowed.

Manual control of the Topology Services (cthats) subsystem:

  Manual control of Topology Services is provided by the cthatsctrl
  script. This provides the means of manually starting, stopping, and
  refreshing Topology Services, among other operations. See the
  description of cthatsctrl in the RSCT Technical Reference book for
  more details. In the case of starting and stopping the daemon this
  is the preferred method, although startsrc and stopsrc can also be 
  used.

  A description of the Topology Services subsystem is given in the
  "The Topology Services Subsystem" chapter of the
  RSCT Administration Guide book.

  A chapter on diagnosing Topology Services problems has been included
  in the RSCT Administration Guide publication. This chapter lists
  information about the AIX error log templates created by Topology
  Services, provides procedures to determine whether the subsystem is
  working correctly, and shows the actions that need to be taken to
  diagnose and recover from common Topology Services problems.

Topology Services Tuning:

  Information on the tunables being used by Topology Services and
  procedures for changing these tunables are provided in the "The
  Topology Services Subsystem" chapter of the
  RSCT Administration Guide book.

  If the nodes in the system tend to operate in an environment of very
  heavy paging or extreme I/O usage (which is often the case of systems
  running the GPFS software), then the system will be prone to false
  failure indications. The default tunable values are too "aggressive"
  for such environments and need to be changed. Consult the
  RSCT Administration Guide book for instructions regarding changing the
  tunables.

  SP partitions with more than 128 nodes also require running with more
  "relaxed" tunable values.

Topology Services, Resource Starvation, and the 
    AIX Workload Manager (WLM):

  The Topology Services daemon is a real time program and requires
  timely access to the CPU and other system resources. It has been
  observed that memory contention has often caused the Topology Services
  daemon to be blocked for significant period of time, resulting in
  "false node downs" and in the triggering of the "Dead Man Switch"
  timer in HACMP/ES.  An AIX error log entry with label "TS_LATEHB_PE"
  may appear when running RSCT 1.2 or later (the message "Late in
  sending Heartbeat by ..."  will appear in the daemon log file in any
  release of RSCT), indicating that the Topology Services daemon was
  blocked. Another AIX error log entry that could be created is
  "TS_DMS_WARNING_ST".

  In many cases, such as when the system is undergoing very heavy disk
  I/O, it is possible for the Topology Services daemon to be blocked in
  paging operations even though it looks like the system has enough
  memory. Three of the possible causes for this phenomenon are:

    - in steady state, when there are no node and adapter events on the
      system, the Topology Services daemon uses a "working set" of pages
      that is substantially smaller than its entire addressing space.
      When node or adapter events happen, the daemon faces the situation
      that additional pages it needs to process the events are not
      present in memory.

    - when heavy file I/O is taking place, the operating system may
      reserve a larger percentage of memory pages to files buffers,
      making fewer pages available to processes.

    - when heavy file I/O is taking place, paging I/O operations may be
      slowed down by contention for the disk.

  The probability that the Topology Services daemon gets blocked for
  paging I/O may be reduced by making use of the AIX Workload Manager.
  WLM is an operating system feature introduced in AIX Version 4.3.3. It
  is designed to give the system administrator greater control over how
  the scheduler and Virtual Memory Manager (VMM) allocate CPU and
  physical memory resources to processes. WLM gives the system
  administrator the ability to create different classes of service, and
  specify attributes for those classes.

  Details on how to use WLM to reduce memory contention problems in
  Topology Services are given in the RSCT Administration Guide, "The
  Topology Services subsystem" chapter, "Diagnosing Topology Services
  problems" section. ("Preventing Memory Contention Problems with the
  Workload Manager" subsection).

Managing Topology Services log and core files:

  Topology Services will create log files in the /var file system.
  Core files, if any, are also created in the same file system.
  For a description of how the log and core files are managed in
  Topology Services, consult the RSCT Administration Guide book,
  "The Topology Services Subsystem" chapter.

Requirement regarding the source routing options of the no command:

  For successful operation of the availability infrastructure, it is
  required that the following network options not be set to 0:
           nonlocsrcroute = 1
           ipsrcroutesend = 1
           ipsrcrouterecv = 1
        ipsrcrouteforward = 1


Topology Services and Baud Rates in RS232 Links (HACMP/ES)

  When heartbeating over RS232 links, the Topology Services daemon
  initially sets the computer-modem baud rate (DTE) to 9600.
  After identifying that the remote side of the connection is running
  RSCT 1.1.1 or later, the daemon sets the baud rate to 38400.

  Some devices currently in use to implement the RS232 links only
  support speeds up to 9600 baud, which causes data to be lost and the
  connection to be broken after the speed is set up to 38400. After the
  connection is broken the speed is set to 9600, allowing a new
  connection to be formed. This results in repeated RS232 network events
  in HACMP.
  
  A mechanism is needed to allow control over the baud rate used by
  Topology Services. This would allow slower devices to be used to
  implement RS232 links.
  
  Customers can now set the RS232 computer-modem (DTE) baud rate using
  the "para" descriptor of the "rs232" entry in the "HACMPnim" Global
  ODM class. If the descriptor is not set then Topology Services will
  still behave as it currently does, and will set the baud rate to
  38400 when the remote side is running RSCT 1.1.1 or later. If the
  descriptor is set, the given baud rate is used when the connection
  to the remote side is established.
  
  The following steps are needed to set up a baud rate:
  
    1) Execute command
  
  /usr/es/sbin/cluster/utilities/claddnim -o'rs232' -l b-rate
  
      where "b-rate" is either 9600, 19200, or 38400 (the only baud
      rates supported)
  
    2) Synchronize the cluster topology by using the following SMIT
       sequence:
  
       smit hacmp
           Cluster Configuration
               Cluster Topology
                   Synchronize Cluster Topology
  
  After the HACMP APAR IY10248 is installed, step 1) above can be
  replaced by
  
     smit hacmp
         Cluster Configuration
             Cluster Topology
                 Configure Network Modules
                     Change a Network Module using Custom Values
                         (select rs232)
                             Parameters
                                 (specify baud rate)
  
  After the steps above are taken, Topology Services will use the given
  baud rate to communicate over the RS232 connection.


Considerations for Network Information Service (NIS)

  This section only pertains to machines, SP nodes or SP control
  workstations, running Network Information Services (NIS).  See the
  "Network Information Service" chapter of "AIX System
  Management Guide: Communications and Networks", SC23-4127, for
  detailed information about NIS administrative procedures.

  If you are using NIS on your SP nodes, or on the SP system's Control
  Workstation, you may have to perform NIS related procedures after
  performing some operations with the rsct.basic control scripts.  These
  scripts are hatsctrl, hagsctrl, and haemctrl.  The scripts can be run
  directly, or can be run indirectly by running the syspar_ctrl command.
  The script functions that pertain to this discussion are "Adding the
  Subsystem", "Cleaning Up the Subsystems", and "Unconfiguring the
  Subsystems", represented by the -a, -c, and -u flags, respectively.
  For this discussion, the interesting feature of these functions is
  that they modify the /etc/services file.  This is significant on
  machines running NIS, because NIS manages a map, the NIS services map,
  that is generated from the /etc/services file on the NIS master
  server.  Furthermore, this map is referenced by clients of the NIS
  domain served by the NIS master server.

  The control scripts only modify the local /etc/services file on the
  machine on which they are being run.  They do not modify the NIS
  services map.  To understand the implications of this, it is helpful
  to review how the role of the local /etc/services file changes when
  running NIS.

  On a machine that is not a NIS client, the contents of the local
  /etc/services file unquestionably affects the results obtained by
  calls to the library routines getservbyname() and getservbyport().
  These library routines simply look in the local file for the first
  entry that matches the criteria specified on the call.

  On an AIX machine that is a NIS client, the contents of the local
  /etc/services file may still affect the results obtained by calls to
  getservbyname() and getservbyport().  However, the contents of the NIS
  services map plays a larger role.  In this case, the library routines
  first send a request to the NIS server process running on a NIS server
  (which may or may not be the same machine as the NIS client).  If the
  NIS server finds a match in the services map, that information is
  returned to the routine's caller.  If the NIS server does not find a
  match, the library routine searches the local /etc/services file for a
  match.

  The rsct.basic daemons depend on the information placed in the
  /etc/services file by the control scripts for correct operation.
  These daemons call getservbyname() to obtain information about ports
  to be used for communications.  On a NIS client, the correct
  information will be obtained by a daemon if the NIS services map does
  not contain an entry for the service name queried by the daemon, or if
  the NIS services map contains an identical entry to what the daemon's
  control script has put in the local /etc/services file.  If the
  information in the NIS map conflicts with the information in the local
  /etc/services file, the daemon may not function correctly.

  What should a prudent NIS system administrator do?  After running
  rsct.basic control scripts with the -a flag, it would be a good idea
  to update the NIS services map on the NIS master server and update any
  NIS slave servers with the new map.  If your NIS master server is not
  the control workstation, this may involve manually copying entries
  placed in the control workstation's /etc/services file by the
  rsct.basic control scripts into the NIS master server's /etc/services
  file before updating the NIS services map.

  After running the rsct.basic control script with the -c or -u flags,
  it would be a good idea to update the NIS services map on the NIS
  master server and update any NIS slave servers with the new map.  If
  your NIS master server is not the control workstation, this may
  involve manually removing entries from the NIS master server's
  /etc/services file before updating the NIS services map.

  The entries placed in the /etc/services file by the rsct.basic control
  scripts have the service names hats.<syspar_name>, hags.<syspar_name>,
  haem.<syspar_name>, and haemd.

  If running a rsct.basic control script with the -a flag yields an
  error message indicating that a service name cannot be registered, and
  examination of the local /etc/services file does not explain the
  problem, the NIS services map may be out of sync with the file.  The
  NIS services map can be examined with the "ypcat services" command.
  If the contents of the NIS services map explains the problem, you may
  have to update the NIS services map before attempting to run the
  control script again.

Security Conversion Program

  If using RSCT in conjunction with PSSP running a DCE security
  environment, the Topology Services startup script will run a
  conversion program that will convert the DCE key into a cluster
  compatible key.  This will allow nodes running Topology Services using
  Cluster Security Services to coexist with nodes running Topology
  Services using DCE.

  In an HACMP environment, HACMP is supposed to harvest interface names for
  interfaces, however, it will let the cluster be defined, started, etc. even 
  if an interface name isn't defined.

  The machines.lst configuration file for Topology Services is then built
  without an interface name. The Topology Services daemon needs to have the
  interface names present due to the way it builds its network data structures
  during startup and on configuration changes ("refresh").

  When the interface names are absent from the machines.lst, Topology Services
  still tries to proceed as if nothing happened, but the internal data 
  structures may become inconsistent, especially if the interface names are not 
  present during refresh. At some point later, and possibly during cluster 
  shutdown, the Topology Services daemon may crash, resulting in the node going
  down.

  The changes introduced for this APAR do two things:
  1. The verification code now checks and reports missing interface names
  2. The daemon also checks for missing interface names while reading the
     machines.lst and will not start if any interface name is undefined.

  If a customer is in this situation (missing interface name), the following 
  steps should rectify it:
  1. Clear the interface names by:
     smitty hacmp
         System Management (C-SPOC)
             HACMP Communication Interface Management
                 Update HACMP Communication Interface with AIX Settings
  2. Verify and synchronize the cluster by:
     smitty hacmp
         Initialization and Standard Configuration
             Verify and Synchronize HACMP Configuration
