Explanation for this CHECK syntax with regular expression?

One of the tables created in the musicbrainz database is called artist_ipi with the following command:

CREATE TABLE artist_ipi ( -- replicate (verbose)
    artist              INTEGER NOT NULL, -- PK, references artist.id
    ipi                 CHAR(11) NOT NULL CHECK (ipi ~ E'^\\d{11}$'), -- PK
    edits_pending       INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >= 0),
    created             TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Could please someone explain the CHECK command for the column ipi with:

CHECK (ipi ~ Eā€™^\d{11}$ā€™)

I recognize the part ā€œAny digit, exactly 11 repetitions, from start of line to end of lineā€
But whatā€™s the meaning of ā€œ~ Eā€?

~ is a case sensitive regular expression match. You can also do a case-insensitive comparrisson or negated comparrisson, see https://www.postgresql.org/docs/10/static/functions-matching.html#FUNCTIONS-POSIX-TABLE

The E before a string marks an ā€œescape stringā€, in which you can use some escape sequences starting with a backslash, e.g. E'\t' would be a tab character. The \\ in this case just becomes a single \, which then allows to have the \d which represents a single digit in the regular expression.

3 Likes

Thank you for the explanation @outsidecontext

Why should a regular expression checking for digits be case sensitive?


I donā€™t fully understand the difference between escaping with E and double the backslash.
Is E'^\d{11}$' (E with one backslash before d) the same
as '^\\d{11}$' (without E but two backslashes before d)?

It doesnā€™t matter in this case of course, but you have to pick one. And case sensitive is just the default. In theory there could be a slight performance benefit from the case sensitive regex, but I donā€™t think this does matter here.

No, AFAIK by default Postgres would treat '^\\d{11}$' as E'^\\d{11}$', but this depends on settings. So itā€™s better to be explicit that one is using a string with escape characters, hence the E. Just using E'^\d{11}$' woud be wrong, as \d is not a valid escape sequence as is interpreted just as d.

1 Like