Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Language elements of the DATA DIVISION

COBOL data structure for an XML document

In structure-oriented processing, the hierarchical structure of an XML document is emulated by the level concept of the record description entries in COBOL. In this a data description entry with an IDENTIFIED clause corresponds to a node from the XML tree, i.e. an attribute or the (start) tag of an element in the XML document. End tags play no part here because they are derived implicitly from the hierarchy.

Optionally the following can be defined in the data structure for each node:

  • the name of the node,

  • the namespace of the node,

  • the value of the node,

  • if necessary, further subordinate elements in the XML document, also with an IDENTIFIED clause.

In the data structure all such additional data items must be directly subordinate to the data item which corresponds to the node. Only one of these additional data items may neither have an IDENTIFIED clause nor be referenced by an IDENTIFIED clause: the data item for the content. If this data item is the only data item which is subordinate to the node description entry, it can be omitted and its PICTURE clause can also be specified directly in the node description entry.

The data structures of the FD need only describe those nodes from the XML tree which the program wants to process. However, the parent of each described node must also be described.

Assignment and Element Position Vector

The basis for processing an XML document is provided by the assignment of nodes from the XML tree to the data description entries in the record description entries of the FD. This definition is stored in the logical information unit Element Position Vector (EPV). For each data item which specifies an IDENTIFIED clause, the EPV contains a reference regarding which node from the XML tree is assigned to the data item (i.e. has 'a valid position'), or the fact that no node is assigned (i.e. the 'position is invalid'). Which statement generated the valid position is also noted.

The IDENTIFIED clause in a data description entry determines which nodes from the XML tree can in principle be assigned to this data description entry.

The assignment of a node is possible if all the following conditions are satisfied (see "IDENTIFIED clause" for the various phrases):

  • The type of node (element or attribute node) in the XML tree matches the phrase (ELEMENT or ATTRIBUTE) from the IDENTIFIED clause.

  • The local name of the node in the XML tree matches the local name specified in the IDENTIFIED clause:

    • If the name is specified by means of BY, both names must be identical except for trailing blanks.

    • If the name is specified by means of USING, any names from the XML tree are regarded as matching.

  • The namespace of the node in the XML tree matches the namespace (NAMESPACE) specified in the IDENTIFIED clause. This applies both for an explicit NAMESPACE phrase and for an implicit phrase, i.e. one which was inherited from subordinate data items:

    • If the namespace is specified by means of BY, both names must be identical except for trailing blanks.

    • If the namespace is specified by means of USING, any names from the XML tree are regarded as matching (in particular also the empty namespace).

    • If the namespace is specified as NULL, the node must also have an empty
      namespace in the XML tree.

Assignment procedure

The new statements for processing an XML document execute assignments. The current assignment procedure executes in the following steps:

  1. At most one node from the XML tree is assigned to precisely one data item in the FD. Which nodes and which data items need to be taken into consideration for this first step depends on the statement concerned.

  2. For each data item to which it was possible to assign a node from the XML tree, all the data items which are made directly subordinate to it are examined by means of the IDENTIFIED clause, and an attempt is made to assign each of them one child of the assigned node, starting with the oldest unassigned child.

  3. The 'invalid' position is noted for each data item to which it was not possible to assign a node in the previous steps and also for all data items in the EPV which are subordinate to such a data item.

When assignment takes place, no data is transferred between the XML document and the data description entries in the program.