The clear-down or loss of an MSCF connection (messages MCS0014 and MCS0015 are output) between the sharers of a shared pubset network affects the functionality of the network. The only immediate effect on the users of a shared pubset is the interruption of the MSCF connection between the master and slave processor. In this case, the MRSCAT entry of the pubset on the slave processor is switched to the QUIET sate. DMS meta-operations on the slave processor cannot be executed until the connection to the master processor is reestablished. The loss of a connection between two slave processors is of no significance to the other sharers of the shared pubset network. The pubset remains fully available to the processors.
If the shared pubset is an SM pubset, a loss of connection during a configuration modification may result in inconsistencies between the image of the pubset configuration in the MRSCAT on the slave processors and the pubset configuration file updated by the master processor. The only functions affected by this are the information functions, which display outdated information. The RESUME-PUBSET-RECONFIGURATION command can be issued after the MSCF connection has been reestablished in order to update the pubset configuration of an SM pubset in the MRSCAT using the information in the pubset configuration file.
In addition to the interference in the availability of the pubset, the loss of a connection between the master and slave processor affects the reconfiguration capability and the monitoring of the shared pubset network. A distinction is made between the following cases:
Export of pubset without master change
If the MSCF connection between the master and slave processor is interrupted, the export of the pubset from the exporting processor is implemented without the cooperation of the partner processor.
If the pubset is exported from the slave processor, its locks are not released on the master processor. If the pubset is exported from the master processor, it remains imported on the slave processor.
At the end of the export procedure, the processor notes the status "EXPORT" in its assigned control blocks of the watchdog file and releases the pubset. The disk monitoring mechanism of the partner processor thus identifies that the status of the processor has changed and can implement the measures that have not been performed:
If the pubset is exported from a slave processor, the master processor resets the locks of the slave processor by means of a slave crash reconfiguration; if the pubset is exported from the master processor, the slave processor likewise exports the pubset.If the EXPORT status cannot be written in the watchdog file (cause: write error or volume was removed), the partner processor cannot detect that the pubset has been exported until the MSCF connection is reestablished. Only then can the master processor reset the locks of the slave processor, or the slave processor export the pubset.
Master change due to export or failure
The behavior for a master change initiated by an export with master change or by the failure of the master processor corresponds to the behavior in the event of an incompletely meshed network and is described in section “Failure of a processor in a shared pubset network” and in section “Clearing down a shared pubset network”.
Wait time for reestablishment of MSCF connection
If a task wants to access a pubset that is QUIET (see section “Loss of connection in an LCS/CCS network”), it first waits for the MSCF connection to be reestablished. The maximum wait time is defined or modified using the BATCH-WAIT-TIME and DIALOG-WAIT-TIME operands of the ADD-MASTER-CATALOG-ENTRY and MODIFY-MASTER-CATALOG-ENTRY commands. The default value is 28800 seconds (= 8 hours) for batch jobs and 30 seconds for dialog jobs.
If the task is still waiting after 600 seconds, the message DMS03A8 is issued at the console so that systems support knows the connection has failed and can terminate the wait status. In general, the task recognizes status changes itself, withdraws its query and continues: If the pubset is available again, the current operation is continued normally. If the pubset has been exported and the answer to the message DMS03A8 requires processing to be aborted or if the wait time has expired, the current operation is aborted and a message (usually DMS0502) is issued.
If the connection failure interrupts internal system processes at an adverse point, it may occur that the task will not continue before the message has been answered, even if the connection has be reestablished in the meantime or the pubset has been exported. To stop the message DMS03A8 from being issued at all in unattended operation, we recommend that both wait times be set to 600 seconds in the MRSCAT entry.