One of the tables created in the musicbrainz database is called artist_ipi
with the following command:
CREATE TABLE artist_ipi ( -- replicate (verbose)
artist INTEGER NOT NULL, -- PK, references artist.id
ipi CHAR(11) NOT NULL CHECK (ipi ~ E'^\\d{11}$'), -- PK
edits_pending INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >= 0),
created TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Could please someone explain the CHECK command for the column ipi
with:
CHECK (ipi ~ Eā^\d{11}$ā)
I recognize the part āAny digit, exactly 11 repetitions, from start of line to end of lineā
But whatās the meaning of ā~ Eā?
~
is a case sensitive regular expression match. You can also do a case-insensitive comparrisson or negated comparrisson, see https://www.postgresql.org/docs/10/static/functions-matching.html#FUNCTIONS-POSIX-TABLE
The E
before a string marks an āescape stringā, in which you can use some escape sequences starting with a backslash, e.g. E'\t'
would be a tab character. The \\
in this case just becomes a single \
, which then allows to have the \d
which represents a single digit in the regular expression.
3 Likes
Thank you for the explanation @outsidecontext
Why should a regular expression checking for digits be case sensitive?
I donāt fully understand the difference between escaping with E and double the backslash.
Is E'^\d{11}$'
(E with one backslash before d) the same
as '^\\d{11}$'
(without E but two backslashes before d)?
It doesnāt matter in this case of course, but you have to pick one. And case sensitive is just the default. In theory there could be a slight performance benefit from the case sensitive regex, but I donāt think this does matter here.
No, AFAIK by default Postgres would treat '^\\d{11}$'
as E'^\\d{11}$'
, but this depends on settings. So itās better to be explicit that one is using a string with escape characters, hence the E. Just using E'^\d{11}$'
woud be wrong, as \d
is not a valid escape sequence as is interpreted just as d
.
1 Like