Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

COBOL language elements for describing an XML document

The level concept for structuring records which already exists in COBOL is used to represent the hierarchical structure of an XML document.

For the sake of clarity, the new COBOL language elements will be presented in the first step without taking namespaces into account. Please refer to section "Namespace" for information on the special aspects of describing and processing namespaces.

A data item in the COBOL data structure corresponds to a node from the tree. the COBOL structure can also contain further data items to which no node in the tree corresponds.

  • The data item corresponding to a root has the level number 01.

  • The data items for children of a node all have the same level number, which is greater than that of their parent.

  • In the COBOL structure the new IDENTIFIED clause identifies the data items to which a node in the tree corresponds and specifies the type of node: element or attribute.

  • the IDENTIFIED clause also serves to specify the name of an element or attribute as contained in the tags in the XML document. This name is case-sensitive.

  • In the structure the data item which is to contain the value of an element or attribute is directly subordinate to the data item with the IDENTIFIED clause for the element or attribute. If this data item is the only data item which is directly subordinate to an IDENTIFIED clause, it can be omitted, and the PICTURE clause for the value can be specified directly together with the IDENTIFIED clause.

  • If an IDENTIFIED clause is specified for a data item, all superordinate group items in the structure must also have an IDENTIFIED clause. this means that the hierarchical structure of the tree must be reflected without a gap with the COBOL structure. Additional intermediate levels on the COBOL side – even if they are only intended to enhance the structuring – are not permitted.

Example 12-33 COBOL description of the entire XML document

XML document

<a
   att="123">
   xxxx
   <B>zz

      <c>9876</c>

   </B>
</a>

 

COBOL data structure

01 root                IDENTIFIED BY "a" ELEMENT.
   02 root-att         PIC 999.
      03 att-value     IX X(10).
   02 root-value       IDENTIFIED BY "B".
   02 child            PIC X.
      03 child-value   IDENTIFIED BY "c" ELEMENT
      03 grandchild    PIC 9(8) BINARY.

Comments:

  • The data item grandchild has no further substructure and can therefore be used at the same time as the specification of the element name (’c’) to accommodate the value.

  • The values from the XML document are made available in accordance with the COBOL description:

    • numeric values aligned on the decimal point, if necessary converted (e.g. data item grandchild)

    • alphanumeric values, truncated if necessary (e.g. data item child-value), or filled with blanks (e.g. data item root-value)

  • It is not necessary that a data item should also be defined in the COBOL data structure for the value of a node, e.g. if the node never has a value in the tree or the program does not wish to process the value.

  • The end tags from the XML document do not appear in the COBOL representation. They are implied in the hierarchical structure.

  • ELEMENT is the default value for the node type and can also be omitted in the IDENTIFIED clause (e.g. for the group item child).

  • The COBOL description also contains data items which do not correspond to any node in the tree (e.g. att-value, root-value and child-value).

  • Do not confuse the literal specified in the IDENTIFIED clause with the initial value of the data item. The latter is only specified by a VALUE clause.

Specifying an element or attribute name in the IDENTIFIED clause

Several options are available for specifying the name of an element or attribute in the IDENTIFIED clause; see the list below.

For the sake of clarity, the special features of namespaces which are also a component part of the name are described summarily in section "Namespace".

  • You have predefined the name by means of IDENTIFIED BY

    • constantly, as a literal

    • variably, as the name of a data item which contains the current element or attribute name

  • The name is not predefined; instead, the name found in the document is returned by means of IDENTIFIED USING.
    This can also be regarded as a particular form of 'predefined' in which not exactly one name is predefined, but all possible names, in other words a type of wildcard notation.

  • The data items specified for the element and attribute names in an IDENTIFIED clause may be defined 'almost everywhere' in the program. However, if their data description entry is contained in a data structure for an XML document, this data description entry must be directly subordinate to the IDENTIFIED clause which references it.

  • The use of BY or USING in data structures is subject to the following restrictions:

    • If multiple data items with an ELEMENT phrase in the IDENTIFIED clause are directly subordinate to a data item, these must all specify the BY phrase.

    • If multiple data items with an ATTRIBUTE phrase in the IDENTIFIED clause are directly subordinate to a data item, these must all specify the BY phrase.

    • If only one data item with an ELEMENT phrase in the IDENTIFIED clause is directly subordinate to a data item, it may specify either the BY or the USING phrase.

    • If only one data item with an ATTRIBUTE phrase in the IDENTIFIED clause is directly subordinate to a data item, it may specify either the BY or the USING phrase.

Example 12-34 Same description for different documents

XML document 1

XML document 2

<a
   att="123">
   xxxx
   <b>zz
      <c>9876</c>
   </b>
</a>
<a_2
  att="345">
   abc
   <b>@#!??*
      <d>100</d>
   </b>
</a_2>

COBOL data structure

01 root                      IDENTIFIED BY root-name. 
   02 root-name              PIC XXXX VALUE "a".
   02 root-att               IDENTIFIED BY "att" ATTRIBUTE.
      03 att-value           PIC 999.
   02 root-value             PIC X(10).
   02 child                  IDENTIFIED BY "b".
      03 child-value         PIC X.  
      03 grandchild          IDENTIFIED USING grandchild-name.
         04 grandchild-value PIC 9(8) BINARY.
         04 grandchild-name  PIC X.

Comments:

  • The indirection in the IDENTIFIED clause for the root group item that instead of the literal "a" a data item which has the value "a" should be specified enables the same COBOL data structure to be used to process XML documents with the same structure in which the name of the root element differs. For XML document 2, for example, the elementary item root-name would need to be supplied with the value "a_2".

  • The IDENTIFIED USING phrase in the grandchild data item means there is no default for the element name, i.e. any names are suitable (e.g. the ’c’ from XML document 1 or the ’d’ from XML document 2) and are returned in the elementary item grandchild-name.

  • The order of the data items for name and value in the structure is arbitrary (e.g. root-value before root-name or grandchild-name before grandchild-value would be possible).

Scope of the COBOL description

The XML document may and need not be reflected 100% in COBOL

It is sufficient if the COBOL program describes only those parts of the XML document which it also wishes to process. However, one requirement must be satisfied here: the parent of each node from the XML document which is described in the COBOL document must also be described in the COBOL program. the root of the XML document must therefore always be described in the COBOL program.

The COBOL data structure may describe either more or fewer nodes than are currently present in the XML document.

Nodes from the XML document can be omitted from the COBOL description for the following two reasons:

  • Subordinate nodes need not be described in the COBOL structure if the application does not wish to process a node – and consequently also all nodes in the subtree whose root the node represents.

  • Repetitions of nodes with the same name in a sequence of element nodes may not be described in the COBOL structure. this cannot occur with attributes as they must have unique names with respect to their element.

    Analogy to files which do not have organization XML: in their file declaration entry (FD) not every single record from the file has its own description; only the different record structures are described in the FD. Similarly repetitions of elements with the same name in the XML document are irrelevant for the description in COBOL – a single description of the element is sufficient. Like the READ statement, which supplies further records, access always taking place via the same record description entry, the XML-specific statements also supply the repetitions of elements, and access also always takes place only via the one description of the element in the COBOL structure.

     

Example 12-35 Partial description of an XML document

XML document

<a>xxx
   <b>yyy</b>
   <d>123</d>
   <b>zz
      <c>9876</c>
   </b>
 </a>

COBOL data structure 1

01 a              IDENTIFIED BY "a". 
   02 a-w         PIC X(10). 
   02 b           IDENTIFIED BY "b". 
      03          b-w PIC X(10). 
      03 c        IDENTIFIED BY "c". 
         04 c-w   PIC X(10).

COBOL data structure 2

01 a           IDENTIFIED BY "a". 
   02 a-w      PIC X(10). 
   02 b        IDENTIFIED BY "b". 
      03       b-w PIC X(10). 
   02 d        IDENTIFIED BY "d". 
      03 d-w   PIC X(10).

Comments:

  • COBOL data structure 1 permits the root of the XML document to be processed, and below this only those subtrees whose root has the name ’b’ – but not the subtree with the root ’d’.

  • When processing the XML document with COBOL data structure 1, the first b node in the document has no children, but a child is described for it in the COBOL structure. What this means in detail is explained in the example "Principle of assigning nodes" in section "Statements for XML processing".

  • COBOL data structure 2 permits the root of the XML document and its children (’b’ and ’d’) to be processed, but no further nodes which are subordinate to these (’c’).

  • When processing the XML document with COBOL data structure 2, the second b node in the document has children, but no child is described in the COBOL structure for these. What this means in detail is explained in the example "Sequential reading" in section "READ".

COUNT clause

If nodes which currently do not currently exist in the tree are described in the COBOL structure, it can be important for further processing to know which data items of the structure are concerned.

The COUNT clause is used for this purpose: it defines an integer numeric element which can contain only the value 0 or 1.

Example 12-36 COUNT phrase

...
 08 x-node      IDENTIFIED BY "x" COUNT IN x-number.
   09 x-value   PIC 9(8).
...



Comments:

  • The name 'x-number' of the elementary item is freely selectable. This name must, however, be unique throughout the entire program without qualification. However, you may not define the data item with this name yourself.

  • The definition is implied by specifying the data name in the COUNT clause.

  • The COUNT clause may only be specified for data items which also have an IDENTIFIED clause.

  • The COUNT elementary item does not specify the number of repetitions of nodes with the name 'x' in the tree.

  • The COUNT elementary item has nothing to do with the value of a node.