1. | Regular one-character expressions match a character according to the following rules: |
1.1 | An ordinary character (none of the special characters listed under 1.2) is a regular expression which matches itself. |
1.2 | A backslash ( \ ) followed by a special character is a regular one-character expression that matches this special character. The following special characters are defined: Period (.), asterisk ( *), opening square bracket ([) and backslash (\). These characters are special characters unless they occur in square brackets [ ] (see 1.4). Circumflex (^) is a special character if it occurs at the beginning of a regular expression or if it occurs in square brackets and immediately follows the opening bracket ( [^ ] ) (see 1.4). Dollar ($) is a special character if it occurs at the end of a regular expression (see 3.2). The character used to delimit a regular expression is a special character for this regular expression.
|
1.3 | A period (.) is a regular one-character expression which matches all characters except the newline character. |
1.4 | A non-empty string enclosed in square brackets is a regular one-character expression which matches every individual character in this string. If, however, the first character in the string is a circumflex (^), the regular expression matches all characters except for the remaining characters in the string and the newline character. But the ^ character only has this “power of exclusion“ if it is the first character after the opening square bracket. The minus sign (-) can be used to denote a range of consecutive ASCII characters, e.g. [0-9] and [0123456789] mean the same. The minus sign is not a special character if it is the first (possibly after a ^) or last character in the string. The closing square bracket does not end such a string if it is the first character (possibly after a ^) in the string. For example, []a-f matches a closing square bracket ] or one of the characters a, b, c, d, e or f. The four characters period (.), asterisk ( *), opening square bracket ([) and backslash (\) stand for themselves within such a string. |
2 | With the help of the following rules, regular expressions can be constructed from regular one-character expressions: |
2.1 | A regular one-character expression is a regular expression that matches everything that matches the regular one-character expression. |
2.2 | An asterisk ( *) followed by a regular one-character expression is a regular expression which matches 0 or several occurrences of the one-character expression. If there is more than one possibility, the longest left-most substring that matches is selected. |
2.3 | A regular one-character expression followed by \{m\}, \{m,\} or \{m,n\} is a regular expression that matches a multiple occurrence of the one-character expression. m and n must be non-negative integers less than 256. {m\} matches exactly m occurrences, \{m,\ matches at least m occurrences and \{m,n\} matches occurrences between m and n (inclusive). If there is more than one possibility, the highest number of occurrences that matches is selected. |
2.4 | The concatenation of regular expressions is a regular expression that matches a string which is produced from concatenation of the strings which match the corresponding components of the regular expression. |
2.5 | A regular expression which occurs between the strings \( and \) matches everything that matches the regular expression between these two strings. |
2.6 | The expression \n matches the same sequence of characters that earlier on in the same regular expression matched an expression enclosed in \( and \). n is a digit; the partial expression concerned begins with the nth occurrence of \, counting from the left. For example, ^\(.\)\1$ matches a line that consists of a string and its repetition. |
3 | In addition a regular expression can be restricted such that it matches only at the beginning of a line, the end of a line or both: |
3.1 | A circumflex (^) at the beginning of a complete regular expression means that this expression only matches a string at the beginning of the line. |
3.2 | A dollar sign ($) at the end of a complete regular expression means that this expression only matches a string at the end of the line. For example, ^completeexpression$ means that the complete regular expression must match the entire line. The empty regular expression, i.e. //, is equivalent to the last regular expression that occurred. |