If a system within the shared pubset network fails, all resources reserved by it must be released or recovery measures must be initiated. All systems involved in the shared pubset network are monitored by the MSCF subsystem.
Two checking mechanisms are used for system monitoring:
the watchdog file $TSOS.SYS.PVS.SHARER.CONTROL to which all sharers periodically write time stamps (vital-sign reports). If a sharer fails, this can be detected by one of the other sharers because of the missing vital-sign report and appropriate measures can be initiated.
If the vital-sign report does not appear, the system connection is checked by sending a request to the sharer concerned, which must be acknowledged within a specific time period.
A partner failure is only assumed if the missing vital-sign report is confirmed by an unsuccessful network/LAN check.
If the owner system fails, a pubset-specific job variable is set on all dependent systems.
Failure of a pubset slave
If a pubset master detects the failure of a participating slave, all resources reserved by the failed pubset slave are released.
Failure of a pubset master
If the pubset master fails, the watchdog mechanism initiates a master change. A prerequisite for the master change is that an active pubset slave is entered in the shared pubset SVL as the backup master which is to take over the new master functions.
The backup master is entered in the SVL DMS record with the command SET-PUBSET-ATTRIBUTES BACKUP-MASTER=.... If no backup master is entered or the entered backup master is not active, the value of the ALTERNATE-BACKUP operand decides whether the first active pubset slave in the SVL becomes the pubset master, the operator explicitly defines one of the active slaves as the new pubset master with the command IMPORT-PUBSET SHARER-TYPE= *MASTER(MASTER-CHANGE=*YES) or whether the master change with an alternate backup master is to be prohibited.
If no backup master is foreseen or the master change fails for another reason, then one of the following actions is necessary:
All participating pubset slaves deactivate the shared pubset and rebuilt the shared pubset network completely.
With the command SET-PUBSET-ATTRIBUTES the permission for a subsequent master change is given and it is initiated with the command IMPORT-PUBSET SHARER-TYP=*MASTER(MASTER-CHANGE=*YES).
Possible reasons for a master change failure:
The entered backup master is not active.
The connection to a participating slave is interrupted.
One of the systems involved in the shared pubset network is using a version of HIPLEX MSCF which is not compatible with the network or an incompatible revision level.
All participating pubset slaves can resume normal operation after a successfully concluded master change. The master change itself is almost completely transparent to the users.