Wide characters and multi-byte characters were defined to extend the "character" concept of computer languages in which one character was allocated one byte of storage space. This allocation is insufficient for languages such as Japanese, for example, since the representation of a character in these languages requires more than one byte of storage. For this reason multi-byte characters and wide characters were added to the character concept. Multi-byte characters represent the characters in the extended character set using two, three or more bytes.
Multi-byte strings can contain "shift sequences" that change the meaning of the following multi-byte codes. Shift sequences can switch between different interpretation modes, for
example: The one byte shift sequence 0200
can specify that the following two bytes are to be interpreted as Japanese characters, and the shift sequence 0201
can specify that the following two bytes are to be interpreted as characters in the ISO-Latin-1 character set.
Programming model
Programs that work with multi-byte characters can be just as easily realized with the help of Amendment 1 functions as programs that use the traditional character concept.
When they are used, the multi-byte characters or strings that are read in from an external file are read into a wchar_t
object or a field of type wchar_t
internally. The multi-byte characters are converted to the corresponding wide characters during the read operation in this case.
The wchar_t
objects can then be edited using isw
xxx functions or wcstod
, wmemcmp
, etc.The resulting wchar_t
objects are then output using output functions such as putwchar
, fputws
, etc.
The wide characters are converted to the corresponding multi-byte characters when output.
Notes on wide characters
A wide character is defined as the code value of an object of type wchar_t
(binary encoded integer value) that corresponds to an element of the extended character set.
The null character has the code value null.
The end-of-file criterion in wide character files is WEOF
.
Wide character constants are written in the form L"wide character string".
Notes on this implementation
Only 1 byte characters are supported as wide characters in this version of the C runtime library. They are of type wchar_t
, which are mapped to the long
type internally. Multi-byte characters correspondingly are always 1 byte long.