Unicode is a standardized alphanumeric character set and includes all known text characters in the world in a single character set. PERCON enables whole records or parts of records (see the statement "SET-RECORD-MAPPING") which are not encoded in Unicode to be converted to or from a Unicode format. To permit this, the input and/or the output file must be assigned a Unicode format. The input and output files must both be SAM files. If a different access method (ISAM, BTAM or PAM) is used, the message PER0115 is issued and conversion is aborted. Automatic conversion takes place whenever the input and output files are assigned different Coded Character Set Names (CCSNs). The different CCSNs must be compatible for conversion to take place.
Note
In the case of an output file with the CCSN UTF-16, a check is made at the start of conversion to see whether the length of the output field is even. If the length of the output field is not even, conversion is aborted and the message PER0116 is issued.
As Unicode data in non-normalized form can exist, PERCON also offers an option for normalizing this data, i.e. conversion to composite character representation, see also the chapter “Normalization”.
In PERCON, only files on disk or tape (catalogued files) can be used as input or output files with a Unicode CCS. Input files on SYSDTA and output files on SYSOUT or SYS-LST are rejected with the error message PER0112. The SET-GROUP-ATTRIBUTES and SET-PAGE-LAYOUT statements cannot be used for output files with a Unicode CCSN. These statements are rejected with the error message PER0118.
PERCON supports the Unicode variants UTF-16, UTF-8 and UTFE which are offered by XHCS (see manual [2] "XHCS") (see the manual [14] "Unicode in BS2000/OSD").
Converting files
The record length can change when conversion takes place from a non-Unicode format to a Unicode format or vice versa.
In the case of variable-length records, the length of the output record is automatically adjusted (split or shortened). This can differ from record to record. The user can specify a maximum record length for the output file. When this length is exceeded, the warning PER0009 is issued once, and the output records are truncated on the right and processed further.
In the case of fixed-length records. the user has the following options for adjusting the record length:
The output record is assigned a variable record length (RECORD-FORMAT=*VARIABLE(...) in the ADD-FILE-LINK command). The user can then specify a maximum record length “externally”. When this length is exceeded, the warning PER0009 is issued once, and the output record is truncated on the right and processed further. When RECORD-SIZE=0, the maximum value for the output record length specified in the table on chapter "File attributes " is adapted. This value is generally high enough to accommodate the converted record completely.
The output record is assigned a fixed record length (RECORD-FORMAT=*FIXED(...) in the ADD-FILE-LINK command). The length is defined by the user. If the output field is too long, unused bytes are padded with Unicode the fill character when conversion to a Unicode format takes place. When conversion takes place to a non-Unicode format, the fill character in the output record’s code is used for padding purposes. In both cases the default is a blank. If the output record is not long enough for conversion, the warning PER0009 is issued once, and the output record is truncated on the right and processed further.
When output takes place in Unicode format, the user can use the blank or the *NIL character as the Unicode filler (see UNICODE-FILLER in the ASSIGN-OUTPUT-FILE statement). However, if a sort is to follow, you are urgently recommended to use blanks for padding purposes as the *NIL character is ignored by SORT.
Note
In the case of record selection using SELECT-INPUT-RECORDS, characters in Unicode format cannot be specified as the selection criterion. The hexadecimal encoding of the Unicode characters can be used for the comparison operators equal to or not equal to. However, the comparison operators greater than or less than cannot be used in a meaningful way because they always relate to the characters’ hexadecimal encoding. They may not be confused with the actual sorting sequence of the characters.
Converting partial records
The PERCON statement SET-RECORD-MAPPING enables individual sections of a record to be converted and/or (if the Unicode variant UTF-16 is being used) normalized. To permit this, the OUTPUT-FORMAT operand must be assigned the value *UNICODE-TRANSLATI-ON in the SET-RECORD-MAPPING statement. If this specification is missing, no conversion or normalization of the section concerned takes place.
When a non-Unicode format is converted to a Unicode format or vice versa, the length of the section to be output can change. This length change must be borne in mind when the length of the output field is specified (OUTPUT-LENGTH). If the output field is too long, the field will be padded with the Unicode fill character when conversion takes place to a Unicode format. When conversion is to a non-Unicode format, the fill character in the code of the output record is used for padding. In both cases the default is a blank. If the output record is too short, the warning PER0009 is issued once, and the field is truncated on the right and processed further.