Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Wide characters and multi-byte characters

&pagelevel(4)&pagelevel

Wide characters and multi-byte characters were defined to extend the "character" concept of computer languages in which one character was allocated one byte of storage space. This allocation is insufficient for languages such as Japanese, for example, since the representation of a character in these languages requires more than one byte of storage. For this reason multi-byte characters and wide characters were added to the character concept. Multi-byte characters represent the characters in the extended character set using two, three or more bytes.
Multi-byte strings can contain "shift sequences" that change the meaning of the following multi-byte codes. Shift sequences can switch between different interpretation modes, for

example: The one byte shift sequence 0200 can specify that the following two bytes are to be interpreted as Japanese characters, and the shift sequence 0201 can specify that the following two bytes are to be interpreted as characters in the ISO-Latin-1 character set.

Programming model

Programs that work with multi-byte characters can be just as easily realized with the help of Amendment 1 functions as programs that use the traditional character concept.

When they are used, the multi-byte characters or strings that are read in from an external file are read into a wchar_t object or a field of type wchar_t internally. The multi-byte characters are converted to the corresponding wide characters during the read operation in this case.
The wchar_t objects can then be edited using iswxxx functions or wcstod, wmemcmp, etc.The resulting wchar_t objects are then output using output functions such as putwchar, fputws, etc.
The wide characters are converted to the corresponding multi-byte characters when output.

Notes on wide characters

A wide character is defined as the code value of an object of type wchar_t (binary encoded integer value) that corresponds to an element of the extended character set.
The null character has the code value null.

The end-of-file criterion in wide character files is WEOF.

Wide character constants are written in the form L"wide character string".

Notes on this implementation

Only 1 byte characters are supported as wide characters in this version of the C runtime library. They are of type wchar_t, which are mapped to the long type internally. Multi-byte characters correspondingly are always 1 byte long.