Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

re_comp, re_exec - compile and execute regular expressions

&pagelevel(4)&pagelevel

Syntax

#include <re_comp.h>

char *re_comp(const char *string);
int re_exec(const char *string);

Description

re_comp() compiles a string into an internal format that is suitable for pattern matching.

re_exec compares the string pointed to by string with the last regular expression that was passed to re_comp().

If re_comp() is called with the value 0 or a null pointer, the current regular expression remains unchanged.

The strings that are passed to re_comp() and re_exec() must be null-terminated. They can contain terminating or embedded newline characters.

re_comp() and re_exec() support simple regular expressions. The rules which apply for the pattern matching are described below.

1.

Regular one-character expressions match a character according to the following rules:

1.1

An ordinary character (none of the special characters listed under 1.2) is a regular expression which matches itself.

1.2

A backslash ( \ ) followed by a special character is a regular one-character expression that matches this special character. The following special characters are defined:

  • Period (.), asterisk ( *), opening square bracket ([) and backslash (\). These characters are special characters unless they occur in square brackets [ ] (see 1.4).

  • Circumflex (^) is a special character if it occurs at the beginning of a regular expression or if it occurs in square brackets and immediately follows the opening bracket ( [^ ] ) (see 1.4).

  • Dollar ($) is a special character if it occurs at the end of a regular expression (see 3.2).

  • The character used to delimit a regular expression is a special character for this regular expression.

1.3

A period (.) is a regular one-character expression which matches all characters except the newline character.

1.4

A non-empty string enclosed in square brackets is a regular one-character expression which matches every individual character in this string. If, however, the first character in the string is a circumflex (^), the regular expression matches all characters except for the remaining characters in the string and the newline character. But the ^ character only has this “power of exclusion“ if it is the first character after the opening square bracket.
The minus sign (-) can be used to denote a range of consecutive ASCII characters, e.g. [0-9] and [0123456789] mean the same. The minus sign is not a special character if it is the first (possibly after a ^) or last character in the string.
The closing square bracket does not end such a string if it is the first character (possibly after a ^) in the string. For example, []a-f matches a closing square bracket ] or one of the characters a, b, c, d, e or f.
The four characters period (.), asterisk ( *), opening square bracket ([) and backslash (\) stand for themselves within such a string.

2

With the help of the following rules, regular expressions can be constructed from regular one-character expressions:

2.1

A regular one-character expression is a regular expression that matches everything that matches the regular one-character expression.

2.2

An asterisk ( *) followed by a regular one-character expression is a regular expression which matches 0 or several occurrences of the one-character expression.
If there is more than one possibility, the longest left-most substring that matches is selected.

2.3

A regular one-character expression followed by \{m\}, \{m,\} or \{m,n\} is a regular expression that matches a multiple occurrence of the one-character expression. m and n must be non-negative integers less than 256.
{m\} matches exactly m occurrences, \{m,\ matches at least m occurrences and \{m,n\} matches occurrences between m and n (inclusive).
If there is more than one possibility, the highest number of occurrences that matches is selected.

2.4

The concatenation of regular expressions is a regular expression that matches a string which is produced from concatenation of the strings which match the corresponding components of the regular expression.

2.5

A regular expression which occurs between the strings \( and \) matches everything that matches the regular expression between these two strings.

2.6

The expression \n matches the same sequence of characters that earlier on in the same regular expression matched an expression enclosed in \( and \). n is a digit; the partial expression concerned begins with the nth occurrence of \, counting from the left. For example, ^\(.\)\1$ matches a line that consists of a string and its repetition.

3

In addition a regular expression can be restricted such that it matches only at the beginning of a line, the end of a line or both:

3.1

A circumflex (^) at the beginning of a complete regular expression means that this expression only matches a string at the beginning of the line.

3.2

A dollar sign ($) at the end of a complete regular expression means that this expression only matches a string at the end of the line.
For example, ^completeexpression$ means that the complete regular expression must match the entire line. The empty regular expression, i.e. //, is equivalent to the last regular expression that occurred.

Returnwert

for re_comp():


Null pointer

if re_comp() has compiled the passed string successfully.

 

String with error message



otherwise


for re_exec():

 

1

if string matches the last compiled expression.

 

0

if string does not match the last compiled expression.

 

-1

if the compiled expression is invalid (in an internal error occurs).

Errors

In the event of an error, re_comp() returns one of the following strings:

No previous regular expression
Regular expression too long
unmatched \(
missing ]
too many \(\)

Notes

A range contains all numbers that lie between the internal representation of the two range limits. This can be different in an EBCDIC and an ASCII environment.

For reasons of portability to implementations that comply with earlier versions of the X/Open standard, the regcomp() and regexec() functions are recommended instead of the ones described here.

See also

regcmp(), regexec(), re_comp.h.