Explanation for this CHECK syntax with regular expression?

postgresql
regex
Tags: #<Tag:0x00007f2a03520948> #<Tag:0x00007f2a03520790>

#1

One of the tables created in the musicbrainz database is called artist_ipi with the following command:

CREATE TABLE artist_ipi ( -- replicate (verbose)
    artist              INTEGER NOT NULL, -- PK, references artist.id
    ipi                 CHAR(11) NOT NULL CHECK (ipi ~ E'^\\d{11}$'), -- PK
    edits_pending       INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >= 0),
    created             TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Could please someone explain the CHECK command for the column ipi with:

CHECK (ipi ~ E’^\d{11}$’)

I recognize the part “Any digit, exactly 11 repetitions, from start of line to end of line”
But what’s the meaning of “~ E”?


#2

~ is a case sensitive regular expression match. You can also do a case-insensitive comparrisson or negated comparrisson, see https://www.postgresql.org/docs/10/static/functions-matching.html#FUNCTIONS-POSIX-TABLE

The E before a string marks an “escape string”, in which you can use some escape sequences starting with a backslash, e.g. E'\t' would be a tab character. The \\ in this case just becomes a single \, which then allows to have the \d which represents a single digit in the regular expression.


#3

Thank you for the explanation @outsidecontext

Why should a regular expression checking for digits be case sensitive?


I don’t fully understand the difference between escaping with E and double the backslash.
Is E'^\d{11}$' (E with one backslash before d) the same
as '^\\d{11}$' (without E but two backslashes before d)?


#4

It doesn’t matter in this case of course, but you have to pick one. And case sensitive is just the default. In theory there could be a slight performance benefit from the case sensitive regex, but I don’t think this does matter here.

No, AFAIK by default Postgres would treat '^\\d{11}$' as E'^\\d{11}$', but this depends on settings. So it’s better to be explicit that one is using a string with escape characters, hence the E. Just using E'^\d{11}$' woud be wrong, as \d is not a valid escape sequence as is interpreted just as d.