Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Multibyte and wide characters

&pagelevel(3)&pagelevel

Wide characters and multibyte characters were defined to expand on the original “character” concept of computer languages, which was based on assigning each character one byte of memory. This assignment proved insufficient for languages such as Japanese, for example, since the representation of a character in such languages requires more than one byte of storage. For this reason, the character concept has now been expanded to include multibyte characters and wide characters.

Multibyte characters represent characters of the extended character set in one, two, three or more bytes.
Multibyte strings may include “shift sequences“, which change the meaning of the following multibyte codes. Shift sequences can thus be typically used to switch between differentinterpretation modes. For example, the one-byte shift sequence 0200 may define that the following byte pairs are to be interpreted as Japanese characters, whereas the shift sequence 0201 may define that the following byte pairs are to be interpreted as characters of the ISO Latin 1 character set.

Programming model

Due to the new functions added in Amendment 1, programs that work with multibyte characters can now be implemented just as easily as programs which use the traditional character concept.

When multibyte characters or strings are read from an external file, they are internally converted to a wchar_t object or an array of type wchar_t. During the read operation, the multibyte characters are converted to the corresponding wide character codes.
These wchar_t objects can then be processed with iswxxx functions, wcstod, wmemcmp, etc., and the resulting wchar_t objects can subsequently be output with the wide character output functions such as putwchar, fputws, and so on.
During the write operation, the wide character codes are converted to the corresponding multibyte characters.

Notes on wide characters

A wide character is defined as a code value (a binary encoded integer) of an object of type wchar_t that corresponds to a member of the extended character set.
A null wide character is a wide character with code value zero.

The end of file criterion in wide character files is WEOF.

Wide character constants are written in the form L“widecharstring“.

Notes on this implementation

This version of the C runtime system supports only 1-byte characters as wide character codes. These characters are of type wchar_t, which is internally mapped to the typelong.

Consequently, multibyte characters always have a length of 1 byte in this implementation.