Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

High availability with hot spare CPUs (SU /390)

&pagelevel(5)&pagelevel

Hot spare CPUs are redundant, fully operable CPUs that are ready for operation and can be attached automatically by VM2000 in the event of the failure of a normal or extra CPU and thus take the place of the failed CPU.

Hot spare CPUs thus increase the availability of CPUs as a resource and ensure the continued operation of the server even in the event of the failure of a real normal or extra CPU.

Hot spare CPUs are not available on SU x86.
Cold spare CPUs, which (after a CPU has failed) are placed in operation after a Server Unit has been rebooted, can be configured there. The former CPU capacity is then available after the restart.

Selected SU /390 have one (standard) or several hot spare CPUs.

Spare CPUs are not assigned to a CPU pool. These replace a failed CPU in any CPU pool.


Hot spare CPUs in normal operation (without CPU failure)

The real hot spare CPUs of a server are made ready for operation (as far as the hardware is concerned) at startup of the monitor system, but they remain logically disconnected. The number of spare CPUs that are available is indicated in message VMS5050. The real spare CPUs are displayed in VM2000 operation when /SHOW-VM-RESOURCES INFORMATION=*CPU/*CONFIGURATION is specified.

In addition to its virtual normal CPUs, each VM also receives virtual spare CPUs. The number of virtual spare CPUs is equal to the number of real spare CPUs.

If the total number of virtual normal CPUs and virtual spare CPUs is greater than 16 (i.e. greater than the maximum multiprocessor level of a VM), the VM’s number of virtual spare CPUs is reduced accordingly.

Guest systems on the VM detect virtual spare CPUs at startup. Multiprocessor guest systems leave virtual spare CPUs detached (state OFF), since they have at least one additional normal CPU available for failure detection.

Monoprocessor guest systems connect a spare CPU so that the operating system can detect the failure of its normal CPU and respond to it (state SLEEP). The number and state of virtual spare CPUs is displayed in the VM specific part when /SHOW-VM-ATTRIBUTES INFORMATION=*STD/*CPU or /SHOW-VM-RESOURCES INFORMATION=*STD/*CPU are specified.


Use of hot spare CPUs in the event of CPU failure

In the event of the hardware failure of a real normal or extra CPU (malfunction alert, machine check), the VM2000 hypervisor automatically detaches the defective CPU and attaches a hot spare CPU that is ready for operation. The multiprocessor grade of the Server Unit remains unchanged by this. The spare CPU is automatically added to the CPU pool to which the failed CPU belonged.

The defective CPU (state WFM) is removed from the CPU pool to which it was assigned. It remains detached until the problem is dealt with by a service engineer. It cannot be attached with /ATTACH-VM-RESOURCES. Nor does it becomes available again when the Server Unit is rebooted. After it is repaired, the service engineer makes the CPU available again. It must then be rebooted.

Other failures of normal or extra CPUs (e.g. when a CPU gets hung) lead to the CPU being detached (state ERR). In this case no spare CPU is attached. The failed CPU can be attached with /ATTACH-VM-RESOURCES. It also becomes available again when the Server Unit is restarted.

A guest system is notified of a CPU error by the VM2000 hypervisor if a virtual CPU of the guest system was active on the failed real normal or extra CPU at the time of the failure. The guest system then detaches this virtual CPU and attaches a virtual spare CPU. The multiprocessor grade and power consumption of a VM are not changed by the failure.

Monoprocessor guest systems process the CPU error on the virtual spare CPU having been in state SLEEP. Monoprocessor guest systems can thus continue to function even after being affected by a CPU failure.

The principle of the spare CPU also applies to the monitor system. If there are no more spare CPUs available when the only operable virtual CPU in the monitor system fails, the VM2000 hypervisor initiates a restart of the monitor system, provided the restart option is set,