Failure | Condition | Reported with | Operation interrupted? | Impact | Response | Data inconsistency on later switch to targets? |
Source or target unit | Protected |
| No | - | Customer support | No |
Unprotected, |
| No | Performance, if source unit is affected | Customer support | Yes, if target unit is affected | |
Unprotected, |
|
| Applications wait | Measure A or continue with remaining unit | No | |
Single remote link |
| No | Write performance | Customer support | No | |
Last remote link | ON-ERR=*CONT |
| No | - | Customer support | Yes |
ON-ERR=*HOLD |
|
| Applications wait for response | Measure A or continue with remaining unit | No | |
Storage system with source units | PGER | Yes | Measure A | Possible 2 | ||
Local system | - | .. | Yes | - | Restart | No |
Complete failure 3 | .. | Yes | Measure A | Possible 2 | ||
Failback to local storage system | - | Yes | Measure B |
1 | NJD0012 messages are not supported for x86 servers. |
2 | Data inconsistency on later switching to the targets is possible unless synchronous or asynchronous (SRDF/A) processing mode is set or if errors have already occurred on remote links or target units. |
3 | Failure of the local storage system with source units and failure of the local system |
Failure recovery measures
Measure | Description | Condition | Action | Command |
A | Switch to target unit, local system affected | Start standby host, attach target units |
| |
Source and target units were synchronized | Make target units available |
| ||
Source and target units were not synchronized, inconsistencies acceptable (or reset to last synchronization point) |
| |||
B | Failback to the local storage system, operation on standby system | Terminate use of target units |
| |
Make target units unavailable |
| |||
Disable all channels and remote links on the local storage system | (Service) | |||
Start local storage system | (Service) | |||
Local storage system OK | Attach and enable remote links | (Service) | ||
Comparison OK / automatic synchronization begun? | Attach channels | (Service) | ||
Start local system |
Special information on failure scenarios with SRDF/A
SRDF/A always builds on an existing SRDF replication (see "SRDF/A configurations"). Restart of SRDF/A after a failure is therefore performed in two steps. SRDF replication must be restarted first (as described in the above sections) and then the SRDF/A session can be reactivated.
If a failure occurs, the following should be noted with regard to SRDF/A replication.
SRDF link failure
Temporary failure:
SRDF/A is able to compensate temporary failures of SRDF links. A time interval of 0 to 10 seconds can be configured in the storage system for which SRDF/A will tolerate an SRDF link failure. If the links are reestablished within this interval, there is no impact on the application. After expiry of the interval, the failure is treated as a permanent failure.Permanent failure:
The SRDF/A session is automatically terminated in the event of a permanent failure. The data on the target side is consistent. Once the links are reestablished, SRDF operation can be resumed using normal SRDF recovery procedures and a new SRDF/A session can be activated.
Available cache for SRDF/A in the local storage system is full
If the I/O load for the local storage system, the available bandwidth for SRDF/A replication and the cache size of the storage system are not (or no longer) correctly configured, the entire write cache for SRDF/A in the local storage system may be used up.
In this case two alternative procedures can be set by customer support:
The application is slowed down to the transmission speed of the SRDF links. This means that during this period performance is poorer than with synchronous SRDF mode in the same configuration.
The SRDF/A session is terminated immediately and automatically. Termination can be delayed by a configurable time interval (the default setting is 0 seconds). The application is slowed during this interval. If the bottleneck is cleared within this time interval, the SRDF/A session is continued; otherwise it is terminated.
Disaster recovery, failback procedure on the target side
Data on the target side is consistent in the event of a failure. The failback procedure is the same as that for SRDF. After a failback, SRDF/A can be reactivated as soon as the application is available again on the local server.