Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

lex - generate programs for lexical tasks

&pagelevel(4)&pagelevel

lex generates a C program from a file which contains the "lex source text" which you have developed for the problem in hand. A lex source text consists of a maximum of three sections: Definitions, rules and user functions. The rules specify the patterns which are searched for in an input text and the action which is taken if a pattern is found. The definitions and user functions are optional.

lex generates a file with the name lex.yy.c. If lex.yy.c is compiled and linked with the Lex library, it copies the input to the output unless a pattern specified in the file is found. In this case the corresponding program text is executed. The pattern which has been matched is located in yytext[], an external character field. Checking and matching of the input file is performed for the search patterns in sequence.


Syntax


lex[ -ctvnV][ -Q[y|n]][ file ...]

-c

represents the use of C responses and is the default

-t

the program is written to the file lex.yy.c, not to the standard output

-v

provides a two line statistical summary

-n

prevents the printout of the summary generated by -v

-V

outputs version information at the standard error output

-Q[y|n]

determines whether or not version information is to be output to the output file lex.yy.c. y|n stands for a yes/no argument in whatever language environment is set. In an English-language environment you enter -Qy to have version information written to lex.yy.c and -Qn to suppress the version information. In a German-language environment, for example, you would use -Qj or -Qn (for ja or nein). By default, no version information is output.

file

Input file. Multiple files are treated as a single file.

file not specified
If no file is specified the standard input is used.


Some standard table sizes are too small for some users.The table sizes for the automatons which are finally generated can be set in the definition section:

%p nNumber of positions is n (default 2500)
%n nNumber of statuses (default 500)
%e nNumber of nodes on syntax tree is n (1000)
%a nNumber of transitions is n (2000)
%k nNumber of packed character classes is n (2500)
%o nSize of output field is n (3000)

The use of one or more sizes automatically entails the option -v if the option -n is not used.


The rules section of file starts with the delimiter %%. In the rules section you can define local variables for yylex(). In the rules section, all lines which start with a space or a tab and precede the first rule are copied to the start of the function yylex() directly after the first lefthand parenthesis.

Each rule consists of a regular expression which describes a pattern which is to be located and actions which are to be performed if the pattern is found. Input text which corresponds to no search pattern is passed on unchanged to the input file by lex. A regular expression consists of text characters with or without additional operators.


The following operators can be used with lex:

\xx
"xy"xy, even if x and/or y are lex operators (except \)
[xy]x or y
[x-z]x, y or z
[^x]any character except x
.any character except newline character
^xx at line start
<y>xx if lex is in start status y
x$x at line end
x?x once or not at all
x*empty string or multiple occurrences of x
x+one or more occurrences of x
x{m,n}m to n occurrences of x
xx|yyxx or yy
x |the action of x is also the action for the next rule
(x)x
x/yx if y follows
{xx}substitution for xx from definition section


Special tasks can be performed in the action section of a rule. To this end, lex provides the following macros:

input()

another character is read from the input stream

unput()

a character is deferred for a later read process

output()

a character is written to the output stream


You can redefine these macros yourself if you want to control input/output yourself. In this case, ensure that consistency is maintained.

Apart from the storage of detected patterns in yytext[] there are other ways of processing detected text patterns using lex functions:

yymore()

Newly recognized characters are appended to those which are already present in yytext[] (yytext[] is normally overwritten with the next character to be found).

yyless( n )

Only the first n characters in yytext[] are considered.

REJECT

Strings which overlap or which are partially contained in other strings are processed. REJECT jumps directly to the next rule without modifying the contents of yytext[].

Hint

If a lex program is linked with c89 [5], then -ll must be specified as the archive parameter.

Locale

The following environment variables affect the execution of lex:

LANG

Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined.

LC_ALL

If set to a non-empty string value, override the values of all the other internationalization variables.

LC_COLLATE

Determine the locale for the behavior of ranges, equivalence classes and multicharacter collating elements within regular expressions. If this variable is not set to the POSIX locale, the results are unspecified.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files), the classification of characters as upper- to lower-case, and the mapping of characters from one case to the other.

LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.

NLSPATH

Determine the location of message catalogs for the processing of LC_MESSAGES.

See also

yacc