If disk monitoring is not in control again within a specific period of time (FAIL-DETECTION-LIMIT / 11 seconds), it cannot write the vital-sign messages in time. This has the same effect as a write error on all shared pubsets. If a connection is also lost at this time, the partner may incorrectly diagnose a crash (for information on how to prevent an erroneous crash detection, see section “Input/output errors when accessing the watchdog file”).
A temporary write error is rectified when new vital-sign messages are written successfully.
The disk monitoring blocks are indicated by message DMS03B7. If a connection is lost at the same time as a disk monitoring block exists, MSCF may initiate an abnormal system termination (SETS); message MCS1300 is then output (see section “System termination by MSCF”).
Disk monitoring blocks may be caused by excessively long wait states, for example due to:
extreme overloading
(a paging bottleneck does not have any effect, however, because vital-sign messages are written by a resident routine),performing a CPU reconfiguration
If these situations cannot be excluded, it is advisable to use the general or partner-specific setting of RECOVERY-START=*CONSISTENT-BY-OPERATOR - possibly temporarily (see “Inhibiting the automatic start of fail reconfiguration” (Global control parameters)).