It is assumed that a node application has failed if the monitored application does not respond to the messages within the configured reply time and taking account of the number of retries configured and if, on the basis of the KDCFILE of the monitored application, it is then detected that this application is no longer running but was also not terminated normally.
If failure or abnormal termination of the monitored node application is detected, openUTM proceeds as follows:
The node application is flagged as failed in the cluster configuration file and removed from the monitoring relationships.
If you have specified a so-called failure script during UTM generation, the monitoring node application starts this script on the computer of the monitoring node application. The following data of the failed application is passed to the failure script:
the application name
the base name of the node application
the host name
the virtual host name or blanks
the reference name of the node application
the error code of the UTM dump (Term Application Reason)
openUTM manual “Generating Applications”, CLUSTER statement
To configure the failure script, specify the operand FAILURE-CMD. This operand passes a command string containing a command to be executed and any arguments.The monitoring node application starts a restart monitoring timer if you have configured this:
openUTM manual “Generating Applications”, CLUSTER statement
To configure the restart monitoring timer, specify the operand RESTART-TIMER-SEC. This specifies the maximum time in seconds that a node application requires for a warm start after a failure.If you have specified an emergency script during UTM generation, the monitoring node application starts this script if the failed node application does not become available again after the restart monitoring timer has expired. The following data of the failed application is passed to the emergency script:
the application name
the base name of the node application
the host name
the virtual host name or blanks
the reference name of the node application
the error code of the UTM dump (Term Application Reason)
openUTM manual “Generating Applications”, CLUSTER statement
To configure the emergency script, specify the operand EMERGENCY-CMD.
This operand passes a command string containing a command to be executed and any arguments.
Sample script on detection of a failure
Sample failure and emergency scripts are supplied with openUTM. These examples output the parameters passed when they are called. If you wish to use the samples in a live environment, you must adapt them to suit the requirements of the relevant cluster.
Unix and Linux systems
The following sample scripts are supplied in the library utmpath/shsc
:
utm-c.emergency
utm-c.failure
Windows systems
The following sample scripts are supplied in the directory utmpath\shsc
:
utm-c.emergency.cmd
utm-c.failure.cmd