Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Normalization

&pagelevel(4)&pagelevel

The encoding of a base character with a diacritic can vary in Unicode. A diacritic is an additional character (e.g. an accent) used to define how a letter is pronounced or stressed. Consequently several encodings can exist for one character in Unicode. The character “Ä”, for example, can also be written as a string consisting of “A” and “°”. Under certain circumstances this characteristic of Unicode can prove to be a hindrance for programming. To permit a uniform format to be assigned to the same characters with different encoding, PER-CON offers the normalization function COMPOSED. COMPOSED combines a base character with the associated diacritic to form a single character. However, normalization can take place only if the input file and/or the output file is assigned the Unicode variant UTF-16.

The following format combinations are possible:

  • The Unicode variant UTF-16 is only assigned to the input file.
    When normalization is requested, first normalization takes place and then conversion.

  • The Unicode variant UTF-16 is only assigned to the output file.
    When normalization is requested, first conversion takes place and then normalization.

  • The Unicode variant UTF-16 is assigned to both the input file and the output file. Conversion serves only for the purpose of normalization.

  • The Unicode variant UTF-16 is assigned to neither the input file nor the output file. The requested normalization is ignored.

Note

Normalization does not take place automatically; it must always be requested by the user (see UNICODE-NORMALIZE in the ASSIGN-OUTPUT-FILE statement). The normalization procedure is very time-consuming, and the user should consequently only request it when it is really necessary.