I’m using the live data feed, in conjunction with mbslave to update a local copy of the MusicBrainz DB.
Specifically, the script downloads the following:
https://metabrainz.org/api/musicbrainz/replication-101634.tar.bz2
and it fails because it’s trying to refer to an artist_meta row before the artist row is created.
The file mbdump/dbmirror_pending contains the following lines:
283089582 "musicbrainz"."artist_meta" i 7973179
283089583 "musicbrainz"."artist" i 7973179
The file mbdump/dbmirror_pendingdata has the following corresponding lines:
283089582 f "id"='1462001' "rating"= "rating_count"=
283089583 f "id"='1462001' "gid"='be509b67-e2af-4f7e-a2e1-f5a60e78655a' "name"='Michael Cent' "sort_name"='Cent, Michael' "begin_date_year"= "begin_date_month"= "begin_date_day"= "end_date_year"= "end_date_month"= "end_date_day"= "type"='1' "area"= "gender"='1' "comment"='' "edits_pending"='0' "last_updated"='2017-01-21 00:15:23.414876+00' "ended"='f' "begin_area"= "end_area"=
The ‘i’ means “insert”, so it looks like it’s trying to insert an artist_meta, followed by an artist. The trouble is, the artist_meta row refers to the artist row before the artist row is inserted, leading to an Postgres IntegrityError.
Just to be clear, it’s trying to create a row in artist_meta with id = 1462001, but artist with id = 1462001 doesn’t exist yet!
Have I misunderstood how replication is supposed to work?
Or is this a bug in the replication tar file?