Regular expressions are a tool for scanning a text for strings which match a defined pattern. A regular expression stands for a set of character strings. A member of this set of strings is said to be matched by the regular expression. A pattern is constructed from one or more regular expressions.
Regular expressions comprise a string of characters, which can be further classified into:
ordinary characters, and
metacharacters.
All alphanumeric characters (all letters and digits) and most other characters are ordinary characters. Within a pattern, ordinary characters match themselves, i.e. the pattern abc will match only those strings that contain the character sequence abc anywhere in them.
There is, however, a small set of characters, known as metacharacters, which have special meanings when encountered in patterns. These characters are described below.
There are two forms of regular expression:
simple regular expressions
extended regular expressions
The syntax of these forms of regular expression is described in the following sections.
The following table shows which commands support regular expressions:
Command | Regular expression form |
awk | extended |
ed | simple |
egrep | extended |
ex | *) |
expr | simple |
grep | simple |
lex | extended |
nl | simple |
sed | simple |
vi | *) |
*)
The ex and vi commands process regular expressions which differ in certain respects from simple regular expressions. These differences are described under ex and vi.
Simple regular expressions
Simple regular expressions are constructed as follows
No. | Regular | Stands for | Example | Matching strings |
1 | c | The character c, where c is not a special | a | a |
2 | \c | The character c, where c can be any | \a | a |
3 | . | Any single character | . | a, x, *, ... |
4 | [s] | Any character from s, where s is a set of | [mz] | m, z |
5 | [^s] | Any character not included in set s.
| [^xyz] | any character except |
6 | r* | Zero, one or more occurrences of | a* | nothing, a, aa, aaa, ... |
7 | r\{m,n\} | At least m and at most n occurrences of | a\{1,2\} | a or aa |
8 | rx | (Concatenation) An occurrence of | [ab]. | ax, a3, a*, bz, ... |
9 | ^r | An occurrence of regular expression r | ^[aA]pple | apple or Apple at |
10 | r$ | An occurrence of regular expression r at | [bB]arge$ | barge or Barge at |
11 | \(r\) | Occurrences of regular expression r. | \([aA]pple\) | apple, Apple |
12 | \n | n is an integer in the range from 1 to 9. | \(a\(b\)\)\2 | abb |
Precedence
The precedence of operators in regular expressions is as shown in the following table.
Operator | Precedence |
[. .] [= =] [: :] | high precedence |
\<char> | . |
[ ] | . |
( ) | . |
* ? + \{m,n\} | . |
Concatenation | . |
^ $ | . |
| | low precedence |
Metacharacters
Metacharacter | The character to the left has a special meaning if |
\ | it is not preceded by a backslash \ |
. | it is not preceded by a backslash \ and it does not appear between [ and ] |
* | it is not preceded by a backslash \, it does not appear between [ and ], it is not the first character in a pattern and it does not come after \) |
$ | it is the last character in a pattern |
^ | it is the first character in a pattern it is the first character in square brackets [ ... ] |
- | it is in square brackets but not placed first or last |
Regular | it is not preceded by a backslash \ |
[. | Character pairs to the left are special characters if they occur within a bracket |
Extended regular expressions
Extended regular expressions include the regular expressions with the following exception:
The construction used for simple regular expressions \(...\) has no special significance for extended regular expressions, for example the extended regular expression \(ab\) represents the string (ab).
Moreover,extended regular expressions provide the following syntax elements for pattern creation:
No. | Regular | Stands for | Example | Matching |
7 | r{m,n} | At least m and at most n occurrences of | a{1,2} | a or aa |
13 | r+ | One or more occurrences of regular | u+ | u, uu, |
14 | r? | Zero or one occurrence of regular | u? | nothing or u |
15 | (r) | Strings matching regular expression r. | (ok(abc)) | okabc |
16 | (r1/r2) | Strings matching regular expression r1 | (ok?ko) | ok or ko |
Precedence
The precedence of operators in extended regular expressions is as shown in the following table.
Operator | Precedence |
[. .] [= =] [: :] | high precedence |
\<char> | . |
[ ] | . |
( ) | . |
* ? + {m,n} | . |
Concatenation | . |
^ $ | . |
| | low precedence |
Examples
Simple regular expressions
Pattern
Meaning
Matching strings
ab.d
a - b - any one character - d
abcd, abXd, ab*d, ...
ab.*d
a - b - any string (including the null string) - d
abd, abxd, abX*Yd, ...
ab[xyz]d
a - b - either x or y or z - d
abxd, abyd, abzd
ab[^c]d
a - b - any character other than c - d
abbd, abXd, ab*d, ...
^abcd$
a line containing only the string abcd
Extended regular expressions
Pattern
Meaning
Matching strings
ab.+d
a - b - any sequence of one or more characters
- dabjd, abX*Yd, ...
abc?d
a - b - c or nothing - d
abd, abcd
(abc|xyz)
abc or xyz
abc, xyz