Description | These functions interpret basic and extended regular expressions as described in XBD Regular Expressions. The structure type regex_t contains at least the following member: size_t re_nsub
| Number of parenthesised subexpressions. |
The structure type regmatch_t contains at least the following members: regoff_t rm_so
| Byte offset from start of string to start of substring. | regoff_t rm_eo
| Byte offset from start of string of the first character after the end of substring. | The regcomp() function compiles the regular expression contained in the string pointed to by the pattern argument and places the results in the structure pointed to by preg. The cflags argument is the bitwise inclusive OR of zero or more of the following flags, which are defined in the header regex.h : REG_EXTENDED
| Use Extended Regular Expressions. | REG_ICASE
| Ignore case in match. | REG_NOSUB
| Report only success/fail in regexec() . | REG_NEWLINE
| Change the handling of newline characters, as described in the text. |
The default regular expression type for pattern is a Basic Regular Expression. The application can specify Extended Regular Expressions using the REG_EXTENDED flag in the cflags argument. On successful completion, it returns 0; otherwise it returns non-zero, and the content of preg is undefined. If the REG_NOSUB flag was not set in cflags, then regcomp() will set re_nsub to the number of parenthesised subexpressions (delimited by \( \) in basic regular expressions or ( ) in extended regular expressions) found in pattern. The regexec() function compares the null-terminated string specified by string with the compiled regular expression preg initialised by a previous call to regcomp() . If it finds a match, regexec() returns 0; otherwise it returns non- zero indicating either no match or an error. The eflags argument is the bitwise inclusive OR of zero or more of the following flags, which are defined in the header regex.h : REG_NOTBOL
| The first character of the string pointed to by string is not the beginning of the line. Therefore, the circumflex character (ˆ), when taken as a special character, will not match the beginning of string. | NEG_NOTEOL
| The last character of the string pointed to by string is not the end of the line. Therefore, the dollar sign ($), when taken as a special character, will not match the end of string. |
If nmatch is 0 or REG_NOSUB was set in the cflags argument to regcomp() , then regexec() will ignore the pmatch argument. Otherwise, the pmatch argument must point to an array with at least nmatch elements, and regexec() will fill in the elements of that array with offsets of the substrings of string that correspond to the parenthesised subexpressions of pattern: pmatch[i].rm_so will be the byte offset of the beginning and pmatch[i].rm_eo will be one greater than the byte offset of the end of substring i. (Subexpression i begins at the ith matched open parenthesis, counting from 1.) Offsets in pmatch[0] identify the substring that corresponds to the entire regular expression. Unused elements of pmatch up to pmatch[nmatch-1] will be filled with -1. If there are more than nmatch subexpressions in pattern (pattern itself counts as a subexpression), then regexec() will still do the match, but will record only the first nmatch substrings. When matching a basic or extended regular expression, any given parenthesised subexpression of pattern might participate in the match of several different substrings of string, or it might not match any substring even though the pattern as a whole did match. The following rules are used to determine which substrings to report in pmatch when matching regular expressions: If subexpression i in a regular expression is not contained within another subexpression, and it participated in the match several times, then the byte offsets in pmatch[i] will delimit the last such match. If subexpression i is not contained within another subexpression, and it did not participate in an otherwise successful match, the byte offsets in pmatch[i] will be -1. A subexpression does not participate in the match when: * or \{ \} appears immediately after the subexpression in a basic regular expression, or *, ?, or { } appears immediately after the subexpression in an extended regular expression, and the subexpression did not match (matched 0 times) or:
If subexpression i is contained within another subexpression j, and i is not contained within any other subexpression that is contained within j, and a match of subexpression j is reported in pmatch[j], then the match or non-match of subexpression i in pmatch[i] will be reported as described in 1. and 2. above, but within the substring reported in pmatch[j] rather than the whole string. If subexpression i is contained in subexpression j, and the byte offsets in pmatch[j] are -1, then the pointers in pmatch[i] also will be -1. If subexpression i matched a zero-length string, then both byte offsets in pmatch[i] will be the byte offset of the character or null terminator immediately following the zerolength string.
If, when regexec() is called, the locale is different from when the regular expression was compiled, the result is undefined. If REG_NEWLINE is not set in cflags, then a newline character in pattern or string will be treated as an ordinary character. If REG_NEWLINE is set, then newline will be treated as an ordinary character except as follows: A newline character in string will not be matched by a period outside a bracket expression or by any form of a non-matching list (see the XBD specification, Chapter 7, Regular Expressions). A circumflex (ˆ) in pattern, when used to specify expression anchoring, will match the zero-length string immediately after a newline in string, regardless of the setting of REG_NOTBOL . A dollar-sign ($) in pattern, when used to specify expression anchoring, will match the zero-length string immediately before a newline in string, regardless of the setting of REG_NOTEOL .
The regfree() function frees any memory allocated by regcomp() associated with preg. The following constants are defined as error return values: REG_NOMATCH
| regexec() failed to match.
| REG_BADPAT
| Invalid regular expression. | REG_ECOLLATE
| Invalid collating element referenced. | REG_ECTYPE
| Invalid character class type referenced. | REG_EESCAPE
| Trailing \ in pattern. | REG_ESUBREG
| Number in \digit invalid or in error. | REG_EBRACK
| [ ] imbalance. | REG_EPAREN
| \( \) or ( ) imbalance. | REG_EBRACE
| { \} imbalance. | REG_BADBR
| Content of \{ \} invalid: not a number, number too large, more than two numbers, first larger than second. | REG_ERANGE
| Invalid endpoint in range expression. | REG_ESPACE
| Out of memory. | REG_BADRPT
| ?, * or + not preceded by valid regular expression. | Extension | REG_ENOSYS
| The function is not supported. | REG_INVARG
| Invalid argument was passed. | REG_EPATTERN
| Empty/null pattern was passed (End) |
The regerror() function provides a mapping from error codes returned by regcomp() and regexec() to unspecified printable strings. The generated string corresponds to the value of the errcode argument, which must be the last non-zero value returned by regcomp() or regexec() with the given value of preg. If errcode is not such a value, the content of the generated string is unspecified. If preg is a null pointer, but errcode is a value returned by a previous call to regexec() or regcomp() , regerror() still generates an error string corresponding to the value of errcode , but it might not be as detailed. If the errbuf_size argument is not 0, regerror() will place the generated string into the buffer with the size of errbuf_size bytes pointed to by errbuf. If the string including the terminating null cannot fit in the buffer, regerror() will truncate the string and terminate the result by null. If errbuf_size is 0, regerror() ignores the errbuf argument, and returns the size of the buffer needed to hold the generated string. If the preg argument to regexec() or regfree() is not a compiled regular expression returned by regcomp( ) , the result is undefined. A preg is no longer treated as a compiled regular expression after it is given to regfree() . |