Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Recovery of DMS files

&pagelevel(4)&pagelevel

The recovery function is available for all DMS files located on an SM pubset. Recovery is based on the reconfigurability of SM pubsets and on the extended HSMS request management within an SM pubset.

Manual recovery

If a host on the computer network is no longer available, the HSMS administrator must ensure that the DMS can access the SM pubset:

  • If the SM pubset was imported as an exclusive pubset, the HSMS administrator must import it to the substitute host. The control file, which is on the SM pubset, is read. All metadata (archives, declarations for the SM pubset, ...) are added to the local metadata. The SM pubset is then available again for new HSMS activities.

  • If the pubset is a shared SM pubset, HSMS activities are still possible for the remaining sharers provided that the SM pubset can actually be accessed. If the master crashes, the sharers must of necessity be reconfigured. The MSCF subsystem provides a control task for the automatic reconfiguration by a backup master.

Once the SM pubset can be accessed again, the HSMS administrator should decide whether the requests that were being processed by the crashed host need to be available again for the restart. The requests of the SM pubset can be output with the HSMS statement SHOW-REQUESTS. The “inhibiting” host is displayed for each request. For more detailed information on request inhibits in SM pubsets, see section "Request management".

Finally the HSMS administrator must issue the HSMS statement RECOVER-REQUESTS to normalize the status of all requests linked to a host and selected for recovery. The description of the RECOVER-REQUESTS statement explains the results that are obtained by a recovery operation.
For shared pubsets, the recovery can only be issued on the master sharer.

Note on MSCF reconfiguration

If an error occurs in the MSCF reconfiguration, the pubset will probably be switched to QUIET mode. From the HSMS point of view, the pubset is, however, still in operation (because it has not been exported). But the pubset cannot be accessed any more. The system administrator should then always remedy the situation by either calling a new master for the SM pubset or exporting it to each slave.

Automatic recovery

The automatic recovery of HSMS requests is only possible for configurations with shared SM pubsets. Backup sharers must be in operation to be able to detect the crash of one of the other sharers. HSMS requests can only be recovered after a successful MSCF recovery.

HSMS requests are recovered automatically by using the product PROP-XT. Events can be defined with PROP-XT that are based on console messages. These events are defined in an S procedure that runs on each host. If certain MSCF messages are registered which refer to a successful MSCF recovery, the recovery of HSMS requests is initiated.

For further information on PROP-XT, see the “PROP-XT” manual [19]. 

Example of an S procedure for automatic recovery

The following S procedure describes an HSMS watchdog task which automatically recovers HSMS requests after a slave crash is detected. For further information on watchdogs, see the “HIPLEX MSCF” manual [20].

Structure of the S procedure

Start PROP-XT processing;

Request operator role ROLEALL which must first have been defined with SYSPRIV;

Create PROP-XT event with the name DMS03B0 which is based on the console message DMS03B0;

Observation loop; termination is controlled by a job variable;


Wait for event DMS03B0;

Output command /SHOW-SHARED-PUBSET <pubset-ID> in variable PUBSET-LIST;

Search each sharer registered by /SHOW-SHARED-PUBSET to establish whether the local host has master access to the pubset and to determine the name of the crashed partner;


If master access and at the same time the slave crashes:



output HSMS statement RECOVER-REQUESTS;

Delete variable PUBSET-LIST;


Disable PROP-XT;

Exit

 

S procedure

The explanations of the run are documented in the procedure.

/SET-PROCEDURE-OPTIONS DATA-ESCAPE-CHAR=*STD
/
/ "-------------------------------------------------------------------------"
/ "-- Example of an HSMS watchdog task                                    --"
/ "--                                                                     --"
/ "-- This SDF-P procedure sets up a watchdog task which was designed     --"
/ "-- for the automatic recovery of HSMS requests after the crash of      --"
/ "-- a slave sharer.                                                     --"
/ "--                                                                     --"
/ "-- The following environment is necessary:                             --"
/ "-- * Product PROP-XT must be installed and the subsystem created;      --"
/ "-- * Operator role ROLEALL must be defined with SYSPRIV:               --"
/ "--   /CREATE-OPER-ROLE ROLEALL, ROUT=*ALL                              --"
/ "--   /MODIFY-OPER-ATTR USER=TSOS,ADD=ROLEALL                           --"
/ "-- * The job variables must be available, in particular $SYSJV.HOST    --"
/ "--                                                                     --"
/ "-- The batch task uses:                                                --"
/ "-- * the job variable HSMS.WATCHDOG.MON to monitor the task            --"
/ "-- * the file HSMS.WATCHDOG.LST to output the task                     --"
/ "--                                                                     --"
/ "-- Start the watchdog task with the SYSHSMS privilege:                 --"
/ "-- /ENTER-PROC <filename-containing-this-proc>, JOB-NAME=HSMSWDOG      --"
/ "--                                                                     --"
/ "-- Stop the watchdog task with:                                        --"
/ "-- /SETJV HSMS.WATCHDOG.MON,'STOP'                                     --"
/ "--                                                                     --"
/ "-- The watchdog task should be created on every host that has          --"
/ "-- master access to at least 1 SM pubset.                              --"
/ "-- During the operation, the watchdog task waits for console message   --"
/ "-- DMS03B0 and outputs the HSMS statement RECOVER-REQUESTS when the    --"
/ "-- local master access and slave crash conditions occur.               --"
/ "--                                                                     --"
/ "-- Note:                                                               --"
/ "-- This example only supports the recovery of requests for master      --"
/ "-- sharers after slave crashes. It does not support the recovery of    --"
/ "-- requests after a master crash.                                      --"
/ "-------------------------------------------------------------------------"
/
/
/ "-------------------------------------------------------------------------"
/ "-- Declaration of the local variables                                  --"
/ "--                                                                     --"
/ "-------------------------------------------------------------------------"
/
/ " Create variable SYSPOP for the dialog with PROP-XT "
/DECLARE-VARIABLE NAME=SYSPOP(TYPE=STRUCTURE)
/
/ " Variable PUBSET-LIST to obtain the output of /SHOW-SHARE-PUBSET "
/DECLARE-VARIABLE VARIABLE-NAME=PUBSET-LIST(TYPE=*STRUCTURE(DEFINITION=-
/*DYNAMIC)),MULTIPLE-ELEMENTS=*LIST
/
/ " Variables PUBSET and SHARER to be able to access the elements of "
/ " PUBSET-LIST " 
/DECLARE-VARIABLE VARIABLE-NAME=PUBSET(TYPE=*STRUCTURE(DEFINITION=*DYNAMIC))
/DECLARE-VARIABLE VARIABLE-NAME=SHARER(TYPE=*STRUCTURE(DEFINITION=*DYNAMIC))
/
/ " Variable LOCAL-HOST, initializes with the local host name "
/DECLARE-VARIABLE VARIABLE-NAME=LOCAL-HOST(TYPE=*STRING)
/
/ " Variable CRASH-HOST for the name of the crashed host "
/DECLARE-VARIABLE VARIABLE-NAME=CRASH-HOST(TYPE=*STRING)
/
/ " 2 Boolean indicators to determine master and crash events "
/DECLARE-VARIABLE VARIABLE-NAME=MASTER-IND(TYPE=*BOOLEAN)
/DECLARE-VARIABLE VARIABLE-NAME=CRASH-IND(TYPE=*BOOLEAN)
/
/ "-------------------------------------------------------------------------"
/ "-- Assign SYSOUT to file                                               --"
/ "--                                                                     --"
/ "-------------------------------------------------------------------------"
/
/ASSIGN-SYSOUT  TO=HSMS.WATCHDOG.LST
/
/ "-------------------------------------------------------------------------"
/ "-- Resolve local host name with job variable $SYSJV.HOST               --"
/ "--                                                                     --"
/ "-------------------------------------------------------------------------"
/
/LOCAL-HOST = (JV('$SYSJV.HOST'))
/INFORM-OPERATOR MSG=-
/     '*** HSMS watchdog started on host &(LOCAL-HOST) ***'
/
/ "-------------------------------------------------------------------------"
/ "-- Prepare PROP-XT environment                                         --"
/ "--                                                                     --"
/ "-------------------------------------------------------------------------"
/
/BEGIN-BLOCK
/
/  " Start PROP-XT preparation "
/  BEGIN-PROP-PROCESS            -
/       PROCESS-NAME=SMCHECK
/  START-PROP-OBJECT-MONITORING  - 
/       OBJECT-NAME=OBJ-MONITOR –
/      ,OBJECT=*OPERATING(OPERATOR-ROLE=ROLEALL)
/  IF-CMD-ERROR
/    GOTO LABEL=PROPERR
/  END-IF
/
/  " Create event DMS03B0 to wait for console message DMS03B0 "
/  START-PROP-EVENT-MONITORING   -
/        EVENT-NAME=DMS03B0      -
/       ,SELECT-EVENT=FROM-OBJECT(EVENT-DATA=*SYSTEM-MSG(MSG-ID=DMS03B0))
/
/  "------------------------------------------------------------------------"
/  "-- Watchdog loop to trace console message DMS03B0                     --"
/  "--                                                                    --"
/  "------------------------------------------------------------------------"
/
/  CREATE-JV HSMS.WATCHDOG.MON
/  SET-JOB-STEP
/  MODIFY-JV  JV-CONTENTS=(JV-NAME=HSMS.WATCHDOG.MON),SET-VALUE='CHECK'
/
/  START-PROP-EVENT-MONITORING -
/      EVENT-NAME=STOPMON,-  
/     ,SELECT-EVENT=*JV-MODIFICATION( -
/          JV-NAME=HSMS.WATCHDOG.MON -
/         ,STRING='STOP' -
/         ,CONDITION=*EQUAL -
/      )
/
/  INFORM-OPERATOR MSG=-
/       '*** HSMS watchdog enters observation loop ***'
/
/  WHILE-LOOP:  WHILE ( TRUE )
/
/    INFORM-OPERATOR MSG=-
/         '*** HSMS watchdog waits for message DMS03B0 ***'
/
/    WAIT-FOR-PROP-EVENTS -
/        EVENT-NAME=(DMS03B0,STOPMON) -
/       ,TIME-LIMIT=*NO -
/       ,JV-CHECK-METHOD=*CJC
/
/    IF CONDITION=(SYSPOP.MAINCODE <> '0000')
/      GOTO LABEL=PROPERR
/    END-IF
/
/    IF ( SYSPOP.EVENT-NAME == 'STOPMON' )
/      EXIT-BLOCK BLOCK=WHILE-LOOP
/    END-IF
/
/    INFORM-OPERATOR MSG=-
/    '*** HSMS watchdog checks system &(SYSPOP.I0) on pubset ***'//-
/    '*** &(SYSPOP.I1) ***'
/
/    "--------------------------------------------------------------------"
/    "-- Output command /SHOW-SHARE-PUBSET redirected to                --"
/    "-- OPS variable                                                   --"
/    "--------------------------------------------------------------------"
/
/    ASSIGN-STREAM STREAM-NAME=SYSINF,TO=*VARIABLE(VARIABLE-NAME=PUBSETLIST)
/    SHOW-SHARED-PUBSET &(SYSPOP.I1)
/    SET-JOB-STEP
/    ASSIGN-STREAM STREAM-NAME=SYSINF,TO=*DUMMY
/
/    "--------------------------------------------------------------------"
/    "-- Check status of pubset for partner                             --"
/    "--                                                                --"
/    "--------------------------------------------------------------------"
/
/    FOR PUBSET=*LIST(PUBSET-LIST)
/
/      " Search sharer list to determine master and crashed hosts " 
/      MASTER-IND = FALSE
/      CRASH-IND  = FALSE
/      FOR SHARER=*LIST(PUBSET.LIST)
/        " Check whether the local host is the master host "
/        IF (    (SHARER.PARTNER-NAME=LOCAL-HOST)  -
/            AND (SHARER.SHARER-TYPE='*MASTER') )
/          MASTER-IND = TRUE
/        END-IF
/        " Check whether partner has crashed "
/        IF (SHARER.SYS-ID='&(SYSPOP.I0)') 
/          CRASH-HOST = SHARER.PARTNER-NAME
/          IF ((SHARER.SHARER-STA='*CRASH') OR (SHARER.SHARER-STA='*CHECK'))
/            CRASH-IND = TRUE
/          END-IF
/        END-IF
/      END-FOR
/
/      " If not master, output console message "
/      IF NOT MASTER-IND
/        INFORM-OPERATOR MSG=-
/        '*** HSMS watchdog - local host is not master sharer ***'//-
/        '*** for pubset &(SYSPOP.I1) ***'
/      END-IF
/
/      " If no crash detected, send console message "
/      IF NOT CRASH-IND
/        INFORM-OPERATOR MSG=-
/        '*** HSMS watchdog - no crash for partner &(CRASH-HOST) ***'//-
/        '*** detected ***'
/      END-IF
/
/      " If host=master and crash detected, recovery is "
/      " required "
/      IF (MASTER-IND) AND (CRASH-IND)
/        " Recover crashed slave sharer "
/        INFORM-OPERATOR MSG=-
/        '*** HSMS watchdog - recovery for partner &(CRASH-HOST) ***'//-
/        '*** on pubset &(SYSPOP.I1) called ***'
/        START-HSMS
//         RECOVER-REQUESTS ENV=*SYS-MAN(&(SYSPOP.I1)) -
//                         ,HOST-NAME=&(CRASH-HOST)
//         STEP
//       END
/      END-IF
/
/    END-FOR
/    SET-JOB-STEP
/
/    FREE-VAR PUBSET-LIST
/    SET-JOB-STEP
/
/
/  END-WHILE
/
/  INFORM-OPERATOR MSG=-
/       '*** HSMS watchdog - observation loop is exited ***'
/
/END-BLOCK
/
/IF-BLOCK-ERROR
/  GOTO LABEL=PROPERR
/ELSE
/  GOTO LABEL=PROPEND 
/END-IF 
/
/ "-------------------------------------------------------------------------"
/ "-- Handling of PROP-XT errors                                          --"
/ "--                                                                     --"
/ "-------------------------------------------------------------------------"
/
/PROPERR:
/  MAINCODE=MAINCODE(); SUBCODE1=SUBCODE1(); SUBCODE2=SUBCODE2()
/  MESSAGE=MSG(MSG-ID='&MAINCODE')
/  SH-VARIABLE (MAINCODE,SUBCODE1,SUBCODE2,MESSAGE)
/  SH-VARIABLE (SYSPOP)
/
/ "-------------------------------------------------------------------------"
/ "-- Exit                                                                --"
/ "--                                                                     --"
/ "-------------------------------------------------------------------------"
/
/PROPEND:
/  END-PROP-PROCESS
/  INFORM-OPERATOR MSG=
/       '*** HSMS watchdog deinitialized ***'
/  ASSIGN-SYSOUT  TO=*PRIMARY
/
/EXIT-PROCEDURE