Explanation for this CHECK syntax with regular expression?

One of the tables created in the musicbrainz database is called artist_ipi with the following command:

CREATE TABLE artist_ipi ( -- replicate (verbose)
    artist              INTEGER NOT NULL, -- PK, references artist.id
    ipi                 CHAR(11) NOT NULL CHECK (ipi ~ E'^\\d{11}$'), -- PK
    edits_pending       INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >= 0),
    created             TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Could please someone explain the CHECK command for the column ipi with:

CHECK (ipi ~ E’^\d{11}$’)

I recognize the part ā€œAny digit, exactly 11 repetitions, from start of line to end of lineā€
But what’s the meaning of ā€œ~ Eā€?

~ is a case sensitive regular expression match. You can also do a case-insensitive comparrisson or negated comparrisson, see https://www.postgresql.org/docs/10/static/functions-matching.html#FUNCTIONS-POSIX-TABLE

The E before a string marks an ā€œescape stringā€, in which you can use some escape sequences starting with a backslash, e.g. E'\t' would be a tab character. The \\ in this case just becomes a single \, which then allows to have the \d which represents a single digit in the regular expression.

3 Likes

Thank you for the explanation @outsidecontext

Why should a regular expression checking for digits be case sensitive?


I don’t fully understand the difference between escaping with E and double the backslash.
Is E'^\d{11}$' (E with one backslash before d) the same
as '^\\d{11}$' (without E but two backslashes before d)?

It doesn’t matter in this case of course, but you have to pick one. And case sensitive is just the default. In theory there could be a slight performance benefit from the case sensitive regex, but I don’t think this does matter here.

No, AFAIK by default Postgres would treat '^\\d{11}$' as E'^\\d{11}$', but this depends on settings. So it’s better to be explicit that one is using a string with escape characters, hence the E. Just using E'^\d{11}$' woud be wrong, as \d is not a valid escape sequence as is interpreted just as d.

1 Like