A check is made to see whether an alphanumeric value matches a specified regular expression. Regular expressions are precisely defined search patterns which go far beyond the options of the search patterns in the LIKE predicate. Regular expressions are a powerful means of searching large data sets for complex search conditions. They have long been used, for example, in the Perl programming language.
operand [NOT] LIKE_REGEX
regular_expression [FLAG
modifiers ]
operand ::=
expression
regular_expression ::=
expression
modifiers ::=
expression
operand
Alphanumeric expression which presents the operand for the comparison with the regular expression.
The value of operand may not be a multiple value with a dimension > 1.
regular_expression
Alphanumeric expression whose value is a regular expression which the value of operand should match. For information on the structure of regular expressions, see "LIKE_REGEX predicate (pattern comparison with regular expressions)".
You specify modifiers for regular_expression in the FLAG clause.
The value of regular_expression may not be a multiple value with a dimension > 1.
FLAG clause
Alphanumeric expression of the modifiers for regular_expression. You can specify the following modifiers:
flag | Meaning |
i | If this modifier is set, letters in the pattern match both upper and lower case letters. |
m | By default, SESAM/SQL treats the subject string as consisting of a single “line” of characters, even if it actually contains several NEWLINE characters (see "CSV() - Reading a BS2000 file as a table"). The “start of line” metacharacter (^) matches only at the start of the string, while the “end of line” metacharacter ($) matches only at the end of the string. |
s | If this modifier is set, a dot metacharacter in the pattern matches all characters including NEWLINE characters (see "CSV() - Reading a BS2000 file as a table"). Without it, newlines are excluded. |
x | If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class; and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern. |
flag must consist of lowercase letters. Each character can be specified multiple times. No blanks may be specified.
FLAG clause not specified:
No modifiers are defined for regular_expression.
Result
Unknown if the value of operand, regular_expression or flag is the NULL value, otherwise
Without NOT:
True if the placeholders for characters and strings in regular_expression can be replaced by characters and strings, respectively, so that the result is equal to the value of operand, and has the same length.
False in all other cases.
With NOT:
True if the placeholders for characters and strings in regular_expression cannot be replaced by characters and strings, respectively, so that the result is equal to the value of operand, and has the same length.
False in all other cases.
Examples
Select all the contact people from the CONTACTS table whose last name contains the string with meier “or something similar”:
SELECT fname, lname FROM contacts WHERE lname LIKE_REGEX '[a-z]* M [ae]? [iy] [a-z]* r' FLAG 'ix' fname lname Albert Gansmeier Berta Hintermayr Thea Mayerer Herbert Meier Anton Kusmir
In the CONTACTS table find the incorrect ZIP codes in the ZIP column:
SELECT * FROM contacts WHERE zip NOT LIKE_REGEX '\d{5}'
In the CONTACTS table find all the email contacts for Fujitsu:
SELECT address FROM contacts WHERE address LIKE_REGEX '([A-Za-z])+\.([A-Za-z]+)@fujitsu\.com' address Albert.Gansmeier@fujitsu.com Berta.Hintermayr@fujitsu.com Thea.Mayerer@fujitsu.com
Reguläre Ausdrücke in SESAM/SQL
The regular expressions in the LIKE_REGEX predicate correspond to the regular expressions in the Perl programming language with the following exceptions:
They are not enclosed in delimiters
There is no “replace” function
The modifiers are specified in the FLAG clause
Special characters
Special characters in regular expressions have special functions:
Character | Meaning | Example |
. | The period stands for any character other |
Hits e.g.: entire, entice, fence |
+ | The plus sign stands for single or multiple |
Hits e.g. speaker, feeling, veeery good |
* | The asterisk stands for no, single or |
Hits e.g. storm, very good, feeling |
? | The question mark stands for no or single |
Hits e.g. storm, seldom |
^ | The circumflex can negate a sign class or, |
Hits e.g. Hans Master, Hans Müller Hits e.g. Master |
$ | In the case of strings the dollar sign |
Hits e.g. Master Hans |
| | The vertical slash separates alternative |
Hits e.g. Master, master |
\ | The backslash masks the subsequent |
Hit with clif? |
[ ] | Square brackets limit a character class. |
Hits e.g. Malter, Manter, Master |
- | The hyphen separates the limits of a |
Hits e.g. Malter, Manter, Master |
( ) | Parentheses group partial expressions. |
Hit with Mr. Master, Ms. Müller |
{ } | Braces are a repetition specification for |
Hit with cliff, cliffffhanger |
Character repetitions
You check single character repetitions with the special characters +
, *
or ?
, see the table above.
You can also use braces to check multiple character repetitions: {m,n}. Here m specifies the minimum number and n the maximum number of repetitions.
The following specifications are permitted:
{m} | Repetition exactly m times |
f{1,3}
returns, for example, hits with life, cliff and cliffhanger.
Groupings
Groupings are formed using parentheses. The subsequent repetition character the refers to the entire expression enclosed in parentheses.
h(el)+lo
returns, for example, hits with hello, helello, helelello.
Selection of characters
A list of characters if square brackets offers a selection of characters which the regular expression can match. The expression in square brackets stands only for one character from the list.
Ma[lns]ter
returns, for example, hits with Malter, Manter and Master, but not with Maltner.
In order to specify a selection from a digit range or a section of the alphabet, use the hyphen “-”.
[A-Z][a-z]+[0-9]{2}
returns hits with words which begin with an uppercase letter followed by one or more lowercase letters and are concluded with precisely two digits, e.g. Masterson15, Smith01, but not masterson15, Smith1.
Alternatives
You can use the vertical slash “|” to specify multiple alternative strings in a regular expression which are to be searched for a string.
([M|m]r|[M|m]s] M[a-z]*
returns hits with titles of persons whose names begin with M, e.g. Mr Master, Ms Miller.
Masking special characters
You must mask special characters when you do not intend the special meaning of the character, but mean its literal, normal meaning, in other words a vertical slash as a vertical slash or a period as a period. The mask character is in all cases the backslash “\”.
([A-Z]|[a-z])+\.([A-Z]|[a-z])+@fujitsu\.com
returns hits with all email addresses in the format: first_name.last_name@fujitsu.com
.
[A-Z]+\.[a-z]+@fujitsu\.com
returns the same result if you specify 'i'
in the flag clause, in other words wish to ignore uppercase/lowercase.
Operators
Letters which are preceded by a backslash “\” indicate special characters or particular character classes:
\n | One of the NEWLINE characters, see "CSV() - Reading a BS2000 file as a table"Tabulator character |
\S | All characters except blanks, tabulator characters, NEWLINE characters, |
\d | A digit |
\B | Negative word boundary, i.e. when \b... or ...\b is specified, a pattern returns a hit |
For example, \d{3,4}
returns hits with all 3- or 4- digit numbers and \w{5}
returns hits with all 5-character words
Priority in regular expressions
The special characters in regular expressions are evaluated according to a particular priority.
1st priority: ( )
(bracketing)
2nd priority: + * ? {m,n}
(repeat operators)
3rd priority: abc ^ $ \b \B
(characters/strings, start/end of line, start/end of word)
4th priority: |
(alternatives)
This enables every regular expression to be evaluated unambiguously. However, if you want the evaluation to be different in the expression from the priority, you can insert parentheses in the expression to force a different evaluation.
For example a|bc|d
returns hits with 'a' or 'bc' or 'd'.(a|b)(c|d)
returns hits with 'ac' or 'ad' or 'bc' or 'bd'.
Notes
Leading or trailing blanks may need to be dealt with using
\s*
in the pattern. In particular when$
(end of the search area) is specified) hits that would otherwise be possible are not detected.Example
With the data type CHAR(n), for instance, the string Bertabbbb (b represents a blank) with the pattern
B.*ta$
is not recognized as blanks follow it.With the LIKE predicate a
Ber%
pattern means that a hit value also really begins withBer
, while the same pattern in the LIKE REGEX predicate may also begin at any position in the record. The^Ber.*
pattern means that the pattern is contained at the start of the record.