Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

LIKE_REGEX predicate (pattern comparison with regular expressions)

A check is made to see whether an alphanumeric value matches a specified regular expression. Regular expressions are precisely defined search patterns which go far beyond the options of the search patterns in the LIKE predicate. Regular expressions are a powerful means of searching large data sets for complex search conditions. They have long been used, for example, in the Perl programming language.


operand [NOT] LIKE_REGEX regular_expression [FLAG modifiers ]

operand ::= expression
regular_expression ::= expression
modifiers ::= expression



operand

Alphanumeric expression which presents the operand for the comparison with the regular expression.
The value of operand may not be a multiple value with a dimension > 1.


regular_expression

Alphanumeric expression whose value is a regular expression which the value of operand should match. For information on the structure of regular expressions, see "LIKE_REGEX predicate (pattern comparison with regular expressions)".
You specify modifiers for regular_expression in the FLAG clause.
The value of regular_expression may not be a multiple value with a dimension > 1.


FLAG clause

Alphanumeric expression of the modifiers for regular_expression. You can specify the following modifiers:

flag

Meaning

i
(caseless)

If this modifier is set, letters in the pattern match both upper and lower case letters.

m
(multiline)

By default, SESAM/SQL treats the subject string as consisting of a single “line” of characters, even if it actually contains several NEWLINE characters (see "CSV() - Reading a BS2000 file as a table"). The “start of line” metacharacter (^) matches only at the start of the string, while the “end of line” metacharacter ($) matches only at the end of the string.
When this modifier is set, the “start of line” and “end of line” constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end.
If there are no NEWLINE characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.

s
(dotall)

If this modifier is set, a dot metacharacter in the pattern matches all characters including NEWLINE characters (see "CSV() - Reading a BS2000 file as a table"). Without it, newlines are excluded.
A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

x
(extended)

If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class; and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

flag must consist of lowercase letters. Each character can be specified multiple times. No blanks may be specified.

FLAG clause not specified:
No modifiers are defined for regular_expression.


Result

Unknown if the value of operand, regular_expression or flag is the NULL value, otherwise

Without NOT:

True if the placeholders for characters and strings in regular_expression can be replaced by characters and strings, respectively, so that the result is equal to the value of operand, and has the same length.

False in all other cases.

With NOT:
True if the placeholders for characters and strings in regular_expression cannot be replaced by characters and strings, respectively, so that the result is equal to the value of operand, and has the same length.

False in all other cases.


Examples

Select all the contact people from the CONTACTS table whose last name contains the string with meier “or something similar”:

SELECT fname, lname FROM contacts
   WHERE lname LIKE_REGEX '[a-z]* M [ae]? [iy] [a-z]* r' FLAG 'ix'
   fname      lname
   Albert     Gansmeier
   Berta      Hintermayr
   Thea       Mayerer
   Herbert    Meier
   Anton      Kusmir


In the CONTACTS table find the incorrect ZIP codes in the ZIP column:

SELECT * FROM contacts WHERE zip NOT LIKE_REGEX '\d{5}'


In the CONTACTS table find all the email contacts for Fujitsu:

SELECT address FROM contacts
   WHERE address LIKE_REGEX '([A-Za-z])+\.([A-Za-z]+)@fujitsu\.com'
   address
   Albert.Gansmeier@fujitsu.com
   Berta.Hintermayr@fujitsu.com
   Thea.Mayerer@fujitsu.com

Reguläre Ausdrücke in SESAM/SQL

The regular expressions in the LIKE_REGEX predicate correspond to the regular expressions in the Perl programming language with the following exceptions:

  • They are not enclosed in delimiters

  • There is no “replace” function

  • The modifiers are specified in the FLAG clause


Special characters

Special characters in regular expressions have special functions:

Character

Meaning

Example

.

The period stands for any character other
than a period.

en.e

Hits e.g.: entire, entice, fence

+

The plus sign stands for single or multiple
occurrence of the character preceding it.

e+

Hits e.g. speaker, feeling, veeery good

*

The asterisk stands for no, single or
multiple occurrence of the character
preceding it.

se*

Hits e.g. storm, very good, feeling

?

The question mark stands for no or single
occurrence of the character preceding it.

se?

Hits e.g. storm, seldom
but not: seesaw

^

The circumflex can negate a sign class or,
in the case of strings, specify that the
following search pattern must occur at the
start of the search area.

^Hans

Hits e.g. Hans Master, Hans Müller
but not: Master Hans
^[^äöüÄÖÜ]*$

Hits e.g. Master
but not: Müller

$

In the case of strings the dollar sign
specifies that the preceding search pattern
must occur at the end of the search area.

Hans$

Hits e.g. Master Hans
but not: Hans Master

|

The vertical slash separates alternative
expressions.

[M|m]aster

Hits e.g. Master, master
but not: Naster, aster

\

The backslash masks the subsequent
(special) character.

clif\?

Hit with clif?
but not: cliff

[ ]

Square brackets limit a character class.

Ma[lns]ter

Hits e.g. Malter, Manter, Master
but not: Marter

-

The hyphen separates the limits of a
character class.

Ma[a-z]ter

Hits e.g. Malter, Manter, Master
but not: Mastner

( )

Parentheses group partial expressions.

(Mr.|Ms.) M[a-z]+

Hit with Mr. Master, Ms. Müller
but not: Baroness Master

{ }

Braces are a repetition specification for
preceding characters.

clif{2,5}

Hit with cliff, cliffffhanger
but not: clif


Character repetitions

You check single character repetitions with the special characters +, * or ?, see the table above.

You can also use braces to check multiple character repetitions: {m,n}. Here m specifies the minimum number and n the maximum number of repetitions.

The following specifications are permitted:

{m}
{m,}
{m,n}

Repetition exactly m times
Repetition at least m times
Repetition at least m times, but not more than n times

f{1,3} returns, for example, hits with life, cliff and cliffhanger.


Groupings

Groupings are formed using parentheses. The subsequent repetition character the refers to the entire expression enclosed in parentheses.

h(el)+lo returns, for example, hits with hello, helello, helelello.


Selection of characters

A list of characters if square brackets offers a selection of characters which the regular expression can match. The expression in square brackets stands only for one character from the list.

Ma[lns]ter returns, for example, hits with Malter, Manter and Master, but not with Maltner.

In order to specify a selection from a digit range or a section of the alphabet, use the hyphen “-”.

[A-Z][a-z]+[0-9]{2} returns hits with words which begin with an uppercase letter followed by one or more lowercase letters and are concluded with precisely two digits, e.g. Masterson15, Smith01, but not masterson15, Smith1.


Alternatives

You can use the vertical slash “|” to specify multiple alternative strings in a regular expression which are to be searched for a string.

([M|m]r|[M|m]s] M[a-z]* returns hits with titles of persons whose names begin with M, e.g. Mr Master, Ms Miller.


Masking special characters

You must mask special characters when you do not intend the special meaning of the character, but mean its literal, normal meaning, in other words a vertical slash as a vertical slash or a period as a period. The mask character is in all cases the backslash “\”.

([A-Z]|[a-z])+\.([A-Z]|[a-z])+@fujitsu\.com returns hits with all email addresses in the format: first_name.last_name@fujitsu.com.

[A-Z]+\.[a-z]+@fujitsu\.com returns the same result if you specify 'i' in the flag clause, in other words wish to ignore uppercase/lowercase.


Operators

Letters which are preceded by a backslash “\” indicate special characters or particular character classes:


\n
\t
\f
\r
\s

One of the NEWLINE characters, see "CSV() - Reading a BS2000 file as a table"Tabulator character
FORM FEED character
CARRIAGE RETURN character
Blanks, tabulator characters, NEWLINE characters, CARRIAGE RETURN
characters, FORM FEED characters

\S

All characters except blanks, tabulator characters, NEWLINE characters,
CARRIAGE RETURN characters, FORM FEED characters

\d
\D
\w
\W
\A
\Z
\b

A digit
Any character which is not a digit
A logographic character, i.e. A through Z, a through z, and the underscore “_”
Any character which is not a logographic character
Start of a string
End of a string
Word boundary, i.e. when \b... or ...\b is specified, a pattern returns a hit only if it is
at the start or end of the word.

\B

Negative word boundary, i.e. when \b... or ...\b is specified, a pattern returns a hit
only if it is not at the start or end of the word.

For example, \d{3,4} returns hits with all 3- or 4- digit numbers and \w{5} returns hits with all 5-character words


Priority in regular expressions

The special characters in regular expressions are evaluated according to a particular priority.

1st priority: ( ) (bracketing)

2nd priority: + * ? {m,n} (repeat operators)

3rd priority: abc ^ $ \b \B (characters/strings, start/end of line, start/end of word)

4th priority: | (alternatives)

This enables every regular expression to be evaluated unambiguously. However, if you want the evaluation to be different in the expression from the priority, you can insert parentheses in the expression to force a different evaluation.

For example a|bc|d returns hits with 'a' or 'bc' or 'd'.
(a|b)(c|d) returns hits with 'ac' or 'ad' or 'bc' or 'bd'.


Notes

  • Leading or trailing blanks may need to be dealt with using \s* in the pattern. In particular when $ (end of the search area) is specified) hits that would otherwise be possible are not detected.

    Example

    With the data type CHAR(n), for instance, the string Bertabbbb (b represents a blank) with the pattern B.*ta$ is not recognized as blanks follow it.

  • With the LIKE predicate a Ber% pattern means that a hit value also really begins with Ber, while the same pattern in the LIKE REGEX predicate may also begin at any position in the record. The ^Ber.* pattern means that the pattern is contained at the start of the record.