Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Operation during crash recovery

&pagelevel(3)&pagelevel

If a processor crashes, within as short a time as possible the remaining processors in the network must be allowed to continue working and the crashed processor must be allowed to reenter the network. It is therefore very important that the crash recovery is completed as quickly as possible.
Normally the crash recovery occurs largely automatically, so it is not necessary for systems support to intervene. However, the automatic mechanisms offered by the system have their limits in the following cases:

  • if the default settings are not correct

  • if required task resources are not released

  • if errors occur during processing.

If message MCS1100 (requiring a response) is output, you must check whether the displayed processor has merely lost its multiprocessor capability (MSCF has terminated) or has actually crashed. Only after this check can the message be answered with “MXCM-<order code of the console message>.MTERM” or with “MXCM-<order code of the console message>.CRASH”.

Often resources occupied by the crashed processor cannot be released for the remaining processors in the network or a function of the crashed processor which was necessary for the network cannot be taken over by another processor in the network. This is the case, for example, if the automatic master change in the shared pubset network has failed (you can display the affected pubsets with the command SHOW-MASTER-CATALOG-ENTRY SELECT=*MASTER-CHANGE-ERROR).

If the master change is rejected, a corresponding message (MCA0103, MCA0104) is output. When analyzing this message, do not forget that a master change must basically have been permitted via the pubset parameters (SHOW-PUBSET-ATTRIBUTES or SET-PUBSET-ATTRIBUTES command). There must also be an MSCF connection between the processor intended as the new master and all other sharers.

If there is a non-active sharer in the SVL (can be displayed with the command SHOW-DISK-STATUS <mn>,*ALL), this entry must be deleted with the UNLOCK-DISK <mn>,<sysid> command before the master change is initiated (the command can only be issued if the specified processor does not actually occupy the pubset). Immediately after this the master change can be started again with the command
IMPORT-PUBSET <catid>, SHARER-TYPE=*MASTER(MASTER-CHANGE=*YES).