Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Start parameters for failover with Oracle® Real Application Clusters


A UTM application communicates with Oracle Real Application Clusters over the XA interface. If an XA call cannot be executed correctly by Oracle in the event of a failover, Oracle returns the value "XAER_RMFAIL". 

In normal circumstances, i.e. when failover support is not activated, openUTM takes this message to mean that it is no longer possible to work with this database and aborts execution of the application.

In order to prevent execution from being aborted in these circumstances, you should also specify the value RAC=Y under the .RMXA parameters and control behavior in the event of a failover with the optional parameters RAC_retry and RAC_recover_down:

.RMXA RM="Oracle_XA",OS=" openstring " ,RAC=Y[,RAC_retry= nnn ]

                                                                            [,RAC_recover_down={Y|N}]

RAC=Y

Enables failover support when connecting the UTM application to Oracle® Real Application Clusters.

RAC=N

Disables failover support.

Default: N

RAC_retry=nnn

nnn specifies the number of times that openUTM attempts to reconnect to the database and execute a recovery job.
If the Commit job could not be executed for a transaction which has the state "Prepare-to-Commit" as a result of a failover, openUTM reconnects to the database and executes a recovery job. If the current XID is contained in the list of supplied XIDs, openUTM executes a Commit job for that XID, i.e. for the current transaction. If the XID is not contained in the list, openUTM performs an xa_close. Then openUTM again tries to connect to the database and execute a recovery job.

Default: 1

RAC_recover_down=

Specifies the behavior of openUTM if the transaction could not be finally completed after the number of attempts specified by RAC_retry=, i.e. if the status of the transaction could not be set to "Commit".

N

openUTM assumes that the transaction is no longer known to Oracle Real Application Clusters. The transaction is assumed to have the status "Commit" and openUTM continues execution of the application.

Default: N

Y

openUTM terminates execution of the application abnormally and thus forces a warm start in order to ensure that the data is consistent.

Behavior of openUTM in the event of failover

If you have enabled failover support, openUTM and the database system behave as follows:

  • The application is not aborted if failover to a node of the Oracle® Real Application Cluster is possible.

  • If the connection is lost between "Prepare" and "Commit" at the end of a transaction, a "Reconnect" with recovery is performed and if this is successful, the "Commit" operation is repeated over this new connection.

  • If transactions are still open when the failover occurs, this can still lead to problems and corresponding error messages even if failover support is enabled (e.g. return code ORA-25402 - transaction must roll back). The reason for this is that Oracle® Real Application Clusters is unable to migrate any open transactions in the event of a failover. These transactions must be rolled back by the UTM application program, see also "Interrupted transactions"Any open multi-step transactions (i.e. following PEND KP) are rolled back by the database system in the event of a failover. openUTM has no influence over this. The database system is automatically reconnected after the rollback. It is then possible to start new transactions.

  • If the failover occurs during a warm start of the application or while the UTM process is being terminated, error processing is carried out as usual and no attempt is made to reconnect.

  • The "prepared statements" database function can lead to errors in the event of a failover.

  • Messages allow the progress of the reconnection to the database system to be monitored.

    • xa_close in the event of reconnection:
      In &RMSTAT insert in message K202, the string "RAC closed" is output for the Oracle® Real Application Clusters instance in place of "closed". Oracle® Real Application Clusters instance in place of "closed".

    • xa_open in the event of reconnection:
      In the &XACALL insert of message K224, the string "RAC: xa_open" is output.

Debug messages

The debug messages contain an indication whether the message refers to an instance of Oracle® Real Application Clusters.

How you obtain XA DEBUG information for the connection to the database is described in section "Debug parameters".

Interrupted transactions

Interrupted transactions can only be continued by the node that started the transaction. For this reason, all UTM processes must always be connected to the same node of the Oracle® Real Application Cluster. It is therefore simplest to proceed as follows:

  • terminate the UTM application after failover of the Oracle® Real Application Cluster and before the failed node is restarted,

  • restart the UTM application after the failed node has been restarted.

This ensures that all UTM processes are connected to the same node of the Oracle® Real Application Cluster and that all transactions of the application are processed by the restarted node of the Oracle® Real Application Cluster.

If it is not possible to terminate and restart the UTM application, i.e. if the nodes of the Oracle® Real Application Cluster are switched over while the openUTM application is running, this can result in the following situation in which not all UTM processes are connected to the same node:

  • One transaction is interrupted by the failover; at this time, the UTM process is still connected to the old node.

  • After the process is restarted or after a PEND ER in the UTM application program, the interrupted transaction is continued by a different UTM process. This process is now connected to the new node.

  • The database instance rejects the request to resume the interrupted transaction (xastart with RESUME) and reports that the transaction is unknown.

  • openUTM reconnects to the database instance. openUTM attempts to resume the transaction over the new connection (i.e. with the new node).

  • The database system again rejects this request, since the database transaction was started on the old node of the Oracle® Real Application Cluster and cannot be continued on the new node.

  • openUTM rolls back the global transaction and issues a K160 message; "NOTA" is output in the insert of the internal return code KCRCDC.

A situation such as this can be handled as described below using a MSGTAC program.

Control using a MSGTAC program

The MSGTAC event service is defined as the message destination for the K160 message. 
MSGTAC reacts to the message insert and initiates a restart over the administration programming interface (KC_CHANGE_APPLICATION). This replaces all processes, restarts them and then connects them to the new node.

This method minimizes the period of time for which the UTM processes are connected to different nodes. The number of transactions that are rolled back is limited to those that were started on the old node and could not be continued on the new node. The transactions that were started on the new node before the restart can be continued.