Fault tolerance in this context means that a UTM application can still remain operational when errors occur in individual program units that force openUTM to abort a transaction. openUTM then ensures that the application program is terminated and reloaded so that the error does not spread any further and have a negative effect on other users of the application and their data.
With regard to the error behavior of openUTM, a distinction is made between:
Internal UTM errors and errors in the system environment
These errors result in an abnormal termination of the application, just like the administration command KDCSHUT KILL or when issuing a KDCADMI call with operation code KC_SHUTDOWN and subcode KC_KILL.
openUTM creates a UTM dump for each process of the application. The UTM dump is edited using the UTM tool KDCDUMP. A description of this procedure can be found in the openUTM manual “Messages, Debugging and Diagnostics on Unix, Linux and Windows Systems”.In the event of serious errors in the dialog terminal process, the dialog terminal process terminates and writes a core dump under the current directory. During this sign-on run, it is not possible to sign on again from the assigned terminal. With minor errors, the dialog terminal process signs off properly from the application.
A printer process behaves similarly to a dialog terminal process when errors occur. The printer process can, if necessary, be restarted using an administration command.
If errors occur in the timer process, the application is terminated abnormally as soon as a job is sent to the timer process from the work processes.
Errors in the application program
These are errors in program units. They can be divided into two groups:
errors that lead to the reloading of the application
errors that may permit the program to continue.