Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Start parameters for failover with Oracle® Real Application Clusters

A UTM application communicates with Oracle® Real Application Clusters over the XA interface. If an XA call cannot be executed correctly by Oracle in the event of a failover, Oracle returns the value "XAER_RMFAIL".
In normal circumstances, i.e. when failover support is not activated, openUTM takes this message to mean that it is no longer possible to work with this database and aborts execution of the application.

In order to prevent execution from being aborted in these circumstances, you should also specify the value RAC=Y under the .RMXA parameters and control behavior in the event of a failover with the optional parameters RAC_retry and RAC_recover_down:

.RMXA RM="Oracle_XA",OS="openstring" ,RAC=Y[,RAC_retry=nnn]

                                   [,RAC_recover_down={Y|N}]

RAC=

Y   Enables failover support when connecting the UTM application to Oracle® Real Application Clusters.

N  disables failover support.

Default: N

RAC_retry=nnn

nnn specifies the number of times that openUTM attempts to reconnect to the database and execute a recovery job.

If the Commit job could not be executed for a transaction which has the state "Prepare-to-Commit" as a result of a failover, openUTM reconnects to the database and executes a recovery job. If the current XID is contained in the list of supplied XIDs, openUTM executes a Commit job for that XID, i.e. for the current transaction. If the XID is not
contained in the list, openUTM performs an xa_close. Then openUTM again tries to connect to the database and execute a recovery job.

Default: 1

RAC_recover_down=

Specifies the behavior of openUTM if the transaction could not be finally completed after the number of attempts specified by RAC_retry=, i.e. if the status of the transaction could not be set to "Commit".

N  openUTM assumes that the transaction is no longer known to Oracle® Real Application Clusters. The transaction is assumed to have the status "Commit" and openUTM continues execution of the application.

Default: N

Y  openUTM terminates execution of the application abnormally and thus forces a warm start in order to ensure that the data is consistent.

 

Behavior of openUTM in the event of failover

If you have enabled failover support, openUTM and the database system behave as follows:

  • The application is not aborted if failover to a node of the Oracle® Real Application Cluster is possible.

  • If the connection is lost between "Prepare" and "Commit" at the end of a transaction, a "Reconnect" with recovery is performed and if this is successful, the "Commit" operation is repeated over this new connection.

  • If transactions are still open when the failover occurs, this can still lead to problems and corresponding error messages even if failover support is enabled (e.g. return codeORA-25402 - transaction must roll back). The reason for this is that Oracle® Real Application Clusters is unable to migrate any open transactions in the event of a failover. These transactions must be rolled back by the UTM application program, see also "Interrupted transactions".

Any open multi-step transactions (i.e. following PEND KP) are rolled back by the database system in the event of a failover. openUTM has no influence over this.

The database system is automatically reconnected after the rollback. It is then possible to start new transactions.

  • If the failover occurs during a warm start of the application or while the UTM process is being terminated, error processing is carried out as usual and no attempt is made to reconnect.

  • The "prepared statements" database function can lead to errors in the event of a failover.

  • Messages allow the progress of the reconnection to the database system to be monitored.

    • xa_close in the event of reconnection:

      In the &RMSTAT insert in message K202, the string "RAC closed" is output for the Oracle® Real Application Clusters instance in place of "closed".

    • xa_open in the event of reconnection:

      In the &XACALL insert of message K224, the string "RAC: xa_open" is output.

    Debug messages

    The debug messages contain an indication whether the message refers to an instance of Oracle® Real Application Clusters. How to get the XA-DEBUG information for the connection to the database, is described in section "Debug parameters".

Interrupted transactions

Interrupted transactions can only be continued by the node that started the transaction. For this reason, all UTM processes must always be connected to the same node of the Oracle® Real Application Cluster. It is therefore simplest to proceed as follows:

  • terminate the UTM application after failover of the Oracle® Real Application Cluster and before the failed node is restarted,

  • restart the UTM application after the failed node has been restarted.

This ensures that all UTM processes are connected to the same node of the Oracle® Real Application Cluster and that all transactions of the application are processed by the restarted node of the Oracle® Real Application Cluster.

If it is not possible to terminate and restart the UTM application, i.e. if the nodes of the Oracle® Real Application Cluster are switched over while the UTM application is running, this can result in the following situation in which not all UTM processes are connected to the same node:

  • One transaction is interrupted by the failover; at this time, the UTM process is still connected to the old node.

  • After the process is restarted or after a PEND ER in the UTM application program, the interrupted transaction is continued by a different UTM process. This process is now connected to the new node.

  • The database instance rejects the request to resume the interrupted transaction (xa-start with RESUME) and reports that the transaction is unknown.

  • openUTM reconnects to the database instance. openUTM attempts to resume the transaction over the new connection (i.e. with the new node).

  • The database system again rejects this request, since the database transaction was started on the old node of the Oracle® Real Application Cluster and cannot be continued on the new node.

  • openUTM rolls back the global transaction and issues a K160 message; "NOTA" is output in the insert of the internal return code KCRCDC.

A situation such as this can be handled as described below using a MSGTAC program.

Control using a MSGTAC program

The MSGTAC event service is defined as an additional message destination for the K160 message. MSGTAC reacts to the message insert and initiates a restart over the administration programming interface (KC_CHANGE_APPLICATION). The application program is reloaded and then connects them to the new node.

This method minimizes the period of time for which the UTM processes are connected to different nodes. The number of transactions that are rolled back is limited to those that were started on the old node and could not be continued on the new node. The transactions that were started on the new node before the restart can be continued.

Oracle® connection

Connection to an Oracle® database is established using a "service". You can also set up "DTP services" in an Oracle® Real Application Clusters environment.

This offers the following options for live operation:

  • automatic error detection

  • automatic failover.
    If an instance fails, a new transaction is redirected to another instance of the service. No administrator intervention is required.

  • load distribution as soon as the connection is established

     

Creating a DTP service (Oracle®)

Use the command "srvctl add service" to add a new service for the database and assign it to an instance of the database.

Example:

Two "DTP services" are to be created with the following options for the RAC database dbracutm with the instances racutm1 and racutm2:

-d

Name of the database

-s

Name of the (DTP) service

-r

Name of the first instance

-a

Name of the second instance

-P

Failover method

"srvctl add service -d dbracutm -s racutmS12 -r racutm1 
                                             -a racutm2 
                                             -P BASIC" 

and

"srvctl add service -d dbracutm -s racutmS21 -r racutm2 
                                             -a racutm1 
                                             -P BASIC" 

The service racutmS12 connects to the instance racutm1 and to the instance racutm2 in the event of a failover. In the same way, the service racutmS21 connects to the instance racutm2 and to the instance racutm1 in the event of a failover.

Convert the services to "DTP services“ using SQLPLUS:

SQL> connect .... 
SQL> execute dbms_service.modify_service 
             ( service_name => 'racutmS12', dtp => true ); 
SQL> execute dbms_service.modify_service 
             ( service_name => 'racutmS21', dtp => true ); 
SQL> exit 

You can start, stop and administer the (DTP) services with "srvctl commands". See also

the Oracle® "Administration and Deployment Guide".

The DTP service must be started on the node on which the instance of the RAC DB system that is primarily assigned to it is running, i.e. the DTP service racutmS21, which is primarily assigned to the instance racutm2, must be started on the node on which this instance is running.

 


  1. Enter the service in the file tnsnames.ora with a net_service_name:

    Example

    RACUTMS1 = 
      (DESCRIPTION = 
         (ADDRESS_LIST = 
            (ADDRESS = (PROTOCOL = TCP) (HOST=server1) (PORT=1521)) 
            (ADDRESS = (PROTOCOL = TCP) (HOST=server2) (PORT=1521)) 
         ) 
         (CONNECT_DATA = 
            (SERVICE_NAME = racutmS12.domain_name ) 
         ) 
         (FAIL_OVER = ON) 
      ) 
    
  2. In the Open string in the start parameters, assign this net_service_name (in this case RACUTMS1) to the operand "SqlNet".