Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

Regular POSIX shell expressions

&pagelevel(3)&pagelevel

Regular expressions are a tool for scanning a text for strings which match a defined pattern. A regular expression stands for a set of character strings. A member of this set of strings is said to be matched by the regular expression. A pattern is constructed from one or more regular expressions.

Regular expressions comprise a string of characters, which can be further classified into:

  • ordinary characters, and

  • metacharacters.

All alphanumeric characters (all letters and digits) and most other characters are ordinary characters. Within a pattern, ordinary characters match themselves, i.e. the pattern abc will match only those strings that contain the character sequence abc anywhere in them.

There is, however, a small set of characters, known as metacharacters, which have special meanings when encountered in patterns. These characters are described below.

There are two forms of regular expression:

  • simple regular expressions

  • extended regular expressions

The syntax of these forms of regular expression is described in the following sections.

The following table shows which commands support regular expressions:

Command

Regular expression form

awkextended
edsimple
egrepextended
ex*)
exprsimple
grepsimple
lexextended
nlsimple
sedsimple
vi*)

*)  The ex and vi commands process regular expressions which differ in certain respects from simple regular expressions. These differences are described under ex and vi.

Simple regular expressions

Simple regular expressions are constructed as follows

No.

Regular
expression

Stands for

Example

Matching strings

1

c

The character c, where c is not a special
character (metacharacter).

a

a

2

\c

The character c, where c can be any
character other than ( ) { } 1 2 3 4 5 6 7 8 9.
Regular expressions in this form are
meaningful if c is a metacharacter.
\c then stands for character c itself, as
the backslash escapes its special
meaning as a metacharacter.

\a
\*

a
*

3

.

Any single character

.

a, x, *, ...

4

[s]














[c1-c2 ]

Any character from s, where s is a set of
characters.

If a right square bracket ] is to be one of
the characters in the set, it has to be
placed first in the set.

If a hyphen - is to be one of the
characters in the set, it has to be placed first
or last.

If a caret ^ is to be one the characters in
the set, it can be placed anywhere but
first.

Any character in the range c1 to c2, in
accordance with the EBCDIC sort
sequence (inclusive of limits c1 and c2).
c1 must come before c2 in the EBCDIC
collating sequence.

If it does not, c1-c2 does not denote a
range but simply stands for the
characters c1 and c2.

The two forms can be combined:
[s1c1-c2s2 ]

[mz]


[]a]



[-a]
[a-]


[a^]



[a-m]





[m-a]



[ado-qxz]

m, z


], a



-, a
-, a


a, ^



a, m and any
character in between
in the EBCDIC
collating sequence


m, a



a, d, o, q, x, z and any
character coming
between o and q in
the EBCDIC collating
sequence

5

[^s]


[^c1-c2 ]

Any character not included in set s.


Any character not in the range between
c1 and c2 inclusive. Refer also to
[c1-c2].





The two forms can be combined:
[^s1c1-c2s2]

[^xyz]


[^0-9]






[^a0-9b]

any character except
x, y, z

any character except
0, 9 and all
characters coming
between 0 and 9 in
the EBCDIC collating
sequence)

any character except
a, b, 0, 9 and all
characters coming
between 0 and 9 in
the EBCDIC collating
sequence

6

r*

Zero, one or more occurrences of
regular expression r. r has to be of form
1-5, 12, 15 or 16.

a*

nothing, a, aa, aaa, ...

7

r\{m,n\}



r\{m\}



r\{m,\}

At least m and at most n occurrences of
regular expression r. r has to be of form
1-5, 12, 15 or 16.

Precisely m occurrences of regular
expression r. r has to be of form 1-5,
12, 15 or 16.

At least m occurrences of regular
expression r. r has to be of form 1-5,
12, 15 or 16.

a\{1,2\}



a\{3\}



a\{3,\}

a or aa



aaa



aaa, aaaa, aaaaa, ...

8

rx

(Concatenation) An occurrence of 
regular expression r followed by an 
occurrence of regular expression x. r
and x can be any regular expressions.

[ab].

ax, a3, a*, bz, ...

9

^r

An occurrence of regular expression r
appearing at the start of a line, i.e.
straight after a newline character or at
the start of the file.
r can be a regular expression in any
form other than number 9.

^[aA]pple

apple or Apple at
the start of a line

10

r$

An occurrence of regular expression r at
the end of a line, i.e. directly before a
newline character. r can be a regular
expression in any form other than
number 10.

[bB]arge$

barge or Barge at
the end of a line

11

\(r\)

Occurrences of regular expression r.
r can be any regular expression.
Only useful together with number 12

\([aA]pple\)

apple, Apple

12

\n

n is an integer in the range from 1 to 9.
\n appearing in a concatenated regular
expression stands for regular
expression x, where x is the nth regular
expression enclosed in \( and \)
sequences that appeared earlier in the
concatenated regular expression.

\(a\(b\)\)\2

s\(illy\)b\1

\(ab\)x\1*

abb

sillybilly

abx,
abxab,
abxabab, ...

Precedence

The precedence of operators in regular expressions is as shown in the following table.

Operator

Precedence

[. .] [= =] [: :]

high precedence

\<char>

.

[ ]

.

( )

.

* ? + \{m,n\}

.

Concatenation

.

^ $

.

|

low precedence

Metacharacters

Metacharacter

The character to the left has a special meaning if

\

it is not preceded by a backslash \

.
[

it is not preceded by a backslash \ and

it does not appear between [ and ]

*

it is not preceded by a backslash \,

it does not appear between [ and ],

it is not the first character in a pattern and

it does not come after \)

$

it is the last character in a pattern

^

it is the first character in a pattern

it is the first character in square brackets [ ... ]

-

it is in square brackets but not placed first or last

Regular
expression
delimiter
such as /.../

it is not preceded by a backslash \

[.
[=
[:

Character pairs to the left are special characters if they occur within a bracket
expression (in square brackets). They will need to be closed by the
corresponding character pair .], =] or :].
Example: [[:upper:]] indicates all uppercase letters.

Extended regular expressions

Extended regular expressions include the regular expressions with the following exception:

The construction used for simple regular expressions \(...\) has no special significance for extended regular expressions, for example the extended regular expression \(ab\) represents the string (ab).

Moreover,extended regular expressions provide the following syntax elements for pattern creation:

No.

Regular
expression

Stands for

Example

Matching
stings

7

r{m,n}
r{m}
r{m,}

At least m and at most n occurrences of
regular expression r. r has to be of form
1-5, 12, 15 or 16.
Precisely m occurrences of regular
expression r. r has to be of form 1-5,
12, 15 or 16.
At least m occurrences of regular
expression r. r has to be of form 1-5,
12, 15 or 16.

a{1,2}
a{3}
a{3,}

a or aa
aaa
aaa, aaaa,
aaaaa, ...

13

r+

One or more occurrences of regular
expression r. r has to be of form 1-5,
15 or 16.

u+

u, uu,
uuu, ...

14

r?

Zero or one occurrence of regular
expression r. r has to be of form 1-5,
15 or 16.

u?

nothing or u

15

(r)

Strings matching regular expression r.
r can be any regular expression.

(ok(abc))
(au)*

okabc
nothing or
au, auau, ...

16

(r1/r2)

Strings matching regular expression r1
or regular expression r2.

(ok?ko)

ok or ko

Precedence

The precedence of operators in extended regular expressions is as shown in the following table.

Operator

Precedence

[. .] [= =] [: :]

high precedence

\<char>

.

[ ]

.

( )

.

* ? + {m,n}

.

Concatenation

.

^ $

.

|

low precedence

Examples

  1. Simple regular expressions

    Pattern

    Meaning

    Matching strings

    ab.d

    a - b - any one character - d

    abcd, abXd, ab*d, ...

    ab.*d

    a - b - any string (including the null string) - d

    abd, abxd, abX*Yd, ...

    ab[xyz]d

    a - b - either x or y or z - d

    abxd, abyd, abzd

    ab[^c]d

    a - b - any character other than c - d

    abbd, abXd, ab*d, ...

    ^abcd$

    a line containing only the string abcd


  2. Extended regular expressions

    Pattern

    Meaning

    Matching strings

    ab.+d

    a - b - any sequence of one or more characters
    - d

    abjd, abX*Yd, ...

    abc?d

    a - b - c or nothing - d

    abd, abcd

    (abc|xyz)

    abc or xyz

    abc, xyz