Function group: string function
The encoding of a character in Unicode is not unambiguous, i.e. more than one coding can exist for a character, see the “ Core manual”.
A typical example of this is provided by the German umlauts. For example, the character Ä has both the code point U+00C4
(composed form) and the code point combination U+0041
and U+0308
(decomposed form). In normalized presentation forms these differences do not occur. If two normalized strings differ, it is in their different code point presentations.
NORMALIZE() converts a national string with national characters which have code points in the range U+0000
through U+2FFF
to a normalized form. Other characters, e.g. surrogates, remain unchanged.
NORMALIZE (
expression [,{ NFC | NFD } [,
length ]])
length ::=
unsigned_integer
expression
National-expression. Its evaluation returns a national string (data type NCHAR or NVARCHAR) in normalized form.
expression may not be a multiple value with dimension > 1.
NFC / NFD
Normalization forms C (“Canonical Decomposition followed by Canonical Composition”) and D (“Canonical Decomposition”) of the Unicode standard.
NFC maps all code points which together result in a character to the corresponding code point. NFD breaks down each “compound” character into its component parts, to the basic characters and the diacritical characters linked to these. The order of the linked diacritical characters is strictly defined here.
length
Maximum length of the normalized presentation in code units.
Length not specified:
The result can have a length of up to 16000 code units, depending on expression.
Result
If the value of expression is the NULL value, the result is the NULL value.
Otherwise:
The normalized presentation of the value of expression.
The following applies: length of the normalized presentation (NFC) <= length of the nonnormalized presentation <= length of the normalized presentation (NFD).
If the length of the normalized presentation is greater than the specified length, the function is aborted with SQLSTATE.
Data type: NVARCHAR( MIN(2*
n,16000) ),
where n is the length of the argument data type NCHAR(n) or NVARCHAR(n). For an argument of type NCHAR the data type is NVARCHAR too.
Example
The following search condition normalizes a user name in order to detect unwanted users who can log in various presentation forms.
... WHERE NORMALIZE(:customer,NFC)
NOT IN (SELECT name FROM unwanted_customers)