Identifiers
In principle, both internal and external names can have any length. In the case of internal names, all characters are significant. For external names, the compiler evaluates a maximum of 32 characters by default (see the rules below). In accordance with ANSI/ISO C, the following characters are allowed when constructing names: the digits 0 to 9, the uppercase letters A to Z, the lowercase letters a to z, and the underscore _. Furthermore, as an extension to ANSI/ISO C, the “dollar” character $ and the “at” sign @ are also allowed in names by default, but this can be turned off with the appropriate options (-K no_dollar
, -K no_at
and DOLLAR-ALLOWED=*NO, AT-ALLOWED=*NO).
Multibyte characters are not supported in identifiers. In mode C11, however, universal character names can be used. These have the form \u0123 or \U01234567. Not all sequences of digits are allowed.
The following applies to external names:
By default, i.e. when the options
-K c_names_std
and C-NAMES=*STD are set,external names can have a maximum length of 32 characters. Longer names are truncated by the compiler to 32 characters.
Up to 30 characters may be used when generating shared code (with the-K share
and SHAREABLE-CODE=*YES options).If the options
-K c_names_unlimited
and C-NAMES=*UNLIMITED are set, no name truncation occurs. The compiler generates entry names in the EEN format. EEN names can have a maximum length of 32000 characters.By default, lowercase letters are converted to uppercase, and underscores (_) are converted to the dollar character ($). These conversions can be suppressed by specifying the appropriate options (
-K llm_case_lower
,-K llm_keep
or LOWER-CASE-NAMES=*YES, SPECIAL-CHARACTERS=*KEEP) so that lowercase letters and underscores are retained in external names.External names must not begin with “I”.
- If the identifier contains universal character names, these are mapped as a character sequence in the external name. The backslash is replaced by a minus sign. The letter u or U and the digits are retained.
The above rules also apply to external names that are declared as extern "C"
in C++ and also for static functions.
main function
The compiler allows the return types int
and void
for the main
function. The return type void always leads to a compiler message (usually a warning).
Two formal parameters are provided for the main
function to allow arguments to be passed to a program in a call:
int main(int argc, char *argv[])
The first parameter argc shows the number of passed arguments. Since the first argument argv[0] is conventionally the program name, the number of arguments is at least 1.
The second parameter argv is a pointer to an array of strings. It holds the program name (in argv[0]) and all arguments entered in the program call in the form of strings terminated with the null byte (\0).
As an extension to ANSI/ISO C, it is also possible to declare a third parameter char *
envp []
for the main
function (see "Extensions to ANSI/ISO C").
More details on passing parameters to main
functions can be found in section “Input of parameters for the main function”.
Characters
By default, the data type char
is treated as unsigned
by the compiler (see also the-K uchar
, -K schar
and SIGNED-CHARACTER=*NO/*YES options).
The value of an EBCDIC character is always positive.
The value of ’\377’ (octal) or ’\xFF’ (hexadecimal) is thus 255.
If a character constant contains a numeric value that is not included in the EBCDIC character set, behavior is undefined.
The value of a character constant that contains more than one character (e.g. ’ab’) is computed from the EBCDIC value of the character as a number to the base 256. The first (right) character is multiplied by 1, the second character by 256, the third character by256 * 256, the fourth character by 256 * 256 * 256.
For example, ’abcd’ produces the value ’a’ * 2563 + ’b’ * 2562 + ’c’ * 256 + ’d’
(= 2172814212).
The value of a multibyte character constant in the form L’ab’ is identical to the value of a character constant in the form ’ab’ in this implementation.
If a character constant contains five or more characters, an error occurs, and no code is generated.
The assignment of int
to char
occurs modulo 256.
Multibyte characters
In this implementation, multibyte characters always have a length of 1 byte, and wchar_t
values are always 32-bit integer values.
Pointers
A pointer is represented in 4 bytes and is aligned on a word boundary. The difference between two pointers is of type int
(ptrdiff_t
).
Arrays
In C, arrays usually have fixed limits; the size of an array in such cases is already known at compile time. The VLA (variable length array) from C11 are an exception. These can only occur on block scope.
An array name in C is always treated as a pointer that points to the first element of the array.
The elements are sequentially stored in memory; the first element has the index zero. In the case of multi-dimensional arrays, the elements are stored in memory in such a way that the last index is the first to vary. Like the array itself, each element is aligned in accordance with the element type.
Structures
In structures, components occupy space in the order of their declaration. Each component is aligned in accordance with its type. The structure itself is aligned on the maximum alignment size required for a component. The size of the structure is a multiple of this alignment so that arrays can be constructed from these structures. See also “Internal representation of data types (alignment and representation in registers)” and the preprocessor directive #pragma aligned
on "aligned pragma".
Example
Size: Alignment: Offset: struct { char a; 1 byte byte boundary 0 (word boundary) short b; 2 bytes half-word boundary 2 char c; 1 byte byte boundary 4 long d; 4 bytes word boundary 8 char e; 1 byte byte boundary 12 }; 16 (structure end)
Bitfields
Bitfields are stored from left to right in a maximum of 64 bits (double-word).
Bitfields can be defined as follows:
int unsigned int signed int long unsigned long signed long long long unsigned long long signed long long short unsigned short signed short char unsigned char signed char
Bitfields without the unsigned
or signed
keyword are represented in accordance with the base type, i.e. char
as unsigned
and int
, long, long long
and short
as signed
. If signed
or unsigned
is specified explicitly, the bitfields are represented accordingly. This default behavior can be modified by means of the following options: -K schar
, -K signed_fields_unsigned
and -K plain_fields_unsigned
or SIGNED-FIELDS=*UNSIGNED, PLAIN-FIELDS=*UNSIGNED, SIGNED-CHARACTER=*YES.
If the bitfield fits in the current byte, half-word, word or double-word, the specified number of bits are placed in it without being aligned; otherwise, the bitfield is aligned on a byte, halfword, word or double-word boundary in accordance with its base type (see example below).
Example
struct { unsigned short a : 7; unsigned short b : 5; unsigned short c : 5; unsigned short d : 8; } x;
Enumerations (enum)
Without an explicit value assignment, the numbers 0, 1, etc. are sequentially assigned to the constants when an enumeration type is defined. If a value is explicitly assigned to aconstant, the following constants automatically receive a correspondingly higher value.
By default, an enumeration type is represented as char
, short
or long
, depending on the threshold limits (highest and lowest values). Regardless of the actual storage space requirements, enum
data can always be represented as long
by using the -K enum_long
or ENUM-TYPE=*LONG compiler options .
Type qualifier volatile
volatile
prevents optimization on accessing a variable. This means that instead of using the old contents, new values are always read from storage. For all assignments, including redundant ones, the appropriate value is directly written to storage. In contrast to
non- volatile
objects, which are subject to extensive optimization and are typically held in registers, the implementation guarantees that all references to volatile
objects will always point to values in storage.
volatile
is only accepted syntactically in K&R mode.
size_t
In this implementation, size_t corresponds to unsigned int
.
ptrdiff_t
In this implementation, ptrdiff_t corresponds to int
.
Conversion of data types
integer --> integer
When an unsigned integer value is converted to a signed integer type of the same size, the bit pattern is retained. If the value cannot be accommodated, the result corresponds to the subtraction of the largest possible number + 1 from the given size.
If a conversion of an integer value to a smaller integer type is involved, and the value cannot be accommodated, the bit pattern is retained and the higher-valued bits are truncated.
floating-point number --> integer
When a floating-point number is converted to an integer, the number is truncated toward zero.
Example
(int)(-1.5) is -1
(int)( 1.5) is 1
The result is undefined if the floating-point number to be converted is too large to be represented as an integer value.
integer --> floating-point number
The conversion of an integer to a floating-point type that cannot accept the correct value is accomplished by rounding.
floating-point number --> floating-point number
The conversion of a floating-point number to a smaller floating-point number (e.g.
double
tofloat
) is accomplished by rounding.integer <--> pointer
When an integer is converted to a pointer, and vice versa, the bit pattern is not changed (simple reinterpretation).
Sign of division remainder
The remainder of an integral division always has the same sign as the dividend.
Example
(-5) / 2 is -2, (-5) % 2 is -1
5 / (-2) is -2, 5 % (-2) is 1
Logical and arithmetic right shift
If the left operand is unsigned, the right shift is logical (padding of 0 bits); otherwise, arithmetic (padding of signed bits).
Example
(-8) >> 1 is -4
Bitwise operations on signed integer values
Bitwise operations (operators ~, <<, &, ^, and |) are executed as unsigned integers on interpretation; however, the result is signed.
Declarators
Any number of declarators may be used to declare a type.
switch statement
Any number of case
branches may be used per switch
statement.
Preprocessor directives
#include
#include directives cannot be specified with a sequence of <name> or “name” headers. Only the first name is accepted.
The compiler accepts #include directives in which the names of headers contain slashes (/) for directories even in the case of PLAM library elements. Every slash in the names of user-defined and standard headers is internally converted to a period for the search in PLAM libraries.
Consequently, in source programs which are ported out of POSIX or UNIX system, for example, the slashes need not be converted to periods.Example
#include <sys/types.h>
The compiler looks for the standard header SYS.TYPES.H in the CRTE library $.SYSLIB.CRTE.
There are no restrictions with respect to the nesting of header files.
#pragma
See section “Pragmas”.
__DATE__, __TIME__
If the date and time of compilation are not available, these macros are defined asfollows:
__DATE__
__TIME__
"Jan 1 1970"
"01:00:00"
Size and value ranges for elementary data types
Type | Bit | Value ranges |
char | 8 | 0 .. 255 |
signed char | 8 | -128 .. 127 |
short | 16 | -32768 .. 32767 |
unsigned short | 16 | 0 .. 65535 |
int | 32 | -2147483648 .. 2147483647 (-231 .. 231-1) |
unsigned int | 32 | 0 .. 4294967295 (0 .. 232-1) |
long | 32 | same as int |
unsigned long | 32 | same as unsigned int |
long long | 64 | -9223372036854775808 .. 9223372036854775807 (-263.. 263-1) |
unsigned long long | 64 | 0 .. 18446744073709551615 (0 .. 264-1) |
float | 32 | 10-75 .. 0.79*1076 |
double | 64 | same as float |
long double | 124 | same as float |
Internal representation of data types (alignment and representation in registers)
This section illustrates how individual C data types are internally represented in memory.
For scalar types, additional details are provided on their representation in registers. On the one hand, this defines how the variables are represented with the register
storage class; on the other, it illustrates how the value of such a variable is interpreted in expressions.
Data type | Size | Alignment | Representation in registers |
char, unsigned char, | 1 byte | byte boundary | right aligned |
short, unsigned short | 2 bytes | half-word boundary | right aligned |
int, unsigned int | 4 bytes | word boundary | as in memory |
long, unsigned long | 4 Byte | word boundary | as in memory |
long long, | 8 bytes | double-word boundary | no representation in registers |
pointer | 4 bytes | word boundary | as in memory |
float | 4 bytes | word boundary | left aligned |
double | 8 bytes | double-word boundary | no conversion is required for |
long double | 16 bytes | double-word boundary | represented by a pair of floating point |
Data type | Size and alignment |
Enumerations | Represented as char, short or long with corresponding alignment, depending on limits. |
Arrays | Size and alignment correspond to element type. |
Structures | Size and alignment for individual components based on above rules; overall alignment based on maximum alignment for components. |
Bitfields | If the alignment boundary for the base type is not exceeded, the specified number of bits is created without alignment; otherwise, the bits are aligned in accordance with the base type. |
Implementation-defined limits
Most limits depend on the available system resources (e.g. on virtual memory). Only the following limits are implementation-defined:
Characteristic | Maximum value |
Number of parameters in a macro definition | 224-1 |
Number of arguments in a macro call | 224-1 |
sizeof limit | 231 |
Storage classes
This section summarizes how storage space is assigned to variables, depending on their storage class.
Storage class register
Variables can be declared as register variables with register
. This is a hint to the compiler that the variables are used relatively often and should therefore be held in registers. This saves the high overhead of accessing storage when reading and writing such variables. Note, however, that the optimization mechanism of the compiler may ignore such hints and implement variables as register variables in accordance with its own algorithm.
Storage class auto (default)
Storage space is reserved in an Automatic Data Area for local variables with the (predefined) storage class auto
.
Parameters in the parameter list
Function parameters are passed in the order of their appearance in a parameter list.
All unsigned
... parameters are represented as unsigned
; all other integer parameters (char
, short
) as int:
right-justified in one word each, aligned on a word boundary, and padded on the left with sign bits (int
) or zeros (unsigned
...) where necessary. Pointers occupy one word.
Depending on the language mode, floating-point numbers are passed differently.
In K&R mode, floating-point numbers (float
, double
) are always passed in double precision, i.e. as a double-word aligned on a double-word boundary.
In C89 or C11 mode, float
values are passed in double precision only if no prototype declaration is present. Otherwise, float
values are passed in single precision, i.e. as a word aligned on a word boundary.
In C++ mode, float
values are always passed in single precision, since prototype declarations must be present.
long double
is passed in two double-words, aligned on a double-word boundary.
Structure parameters are aligned on a word or double-word boundary as required. The size of a structure is padded in accordance with the maximum alignment requirement for a component. For example, if a structure contains only short
and char
components, the size will be a multiple of 2 bytes.
Arrays cannot be passed as values. A pointer to the first array element is passed.
Static variables
The compiler reserves storage space for the following types of static variables already at the time of compilation:
local static variables
global static variables
global external variables
The difference between these storage classes lies in their scope:
Local static variables are variables that are defined with the
static
storage class specifier within a function. They are only recognized in the function in which they are defined.Global static variables are variables that are defined with the
static
storage class specifier outside a function. They are only recognized within a compilation unit.Global external variables are variables that are defined outside a function without the
static
storage class specifier. These variables can also be accessed in other compilation units, provided they are declared there with theextern
storage class specifier.
Functions without a prototype
If a function without a prototype is called and there is parameter information present, then an error may be output in some cases. An error is output when an “old style” definition or a prototype in the K&R mode is found.
If the argument and the parameter are of different types (according to their customary type extensions) and one of the following arises, then an error is output:
Parameter and argument are of different sizes
Parameter and argument have different alignments
The parameter is of type float, double or long double
The argument is of type float, double or long double
The error can be downgraded to a warning. If this is done, the call made at runtime will generally fail.