How to solve mbslave:command not found error

jho88 · June 9, 2022, 6:57am

I want to find the quickest way to get a fully setup Core data database.

I don’t need a server, just the core data itself.

I’ve read through all the documents on the MusicBrainz site and believe the best solution is setting up mbdata on an Ubuntu machine:

I’ve set up a fresh Ubuntu VM, carefully followed the instructions in the above link, but get stopped at the following line:

echo ‘CREATE SCHEMA musicbrainz;’ | mbslave psql -S

Error:
mbslave: command not found

How do I resolve this? Or is there a better way?

I’m not so familiar with Ubuntu, nor Docker. Is there a better step by step guide somewhere?

InvisibleMan78 · June 9, 2022, 8:57am

The quickest way is to follow this instructions:

You don’t have to use the included server, but you can access the downloaded and imported data as you want. Even the direct access is possbile, if you publish the ports according to GitHub - metabrainz/musicbrainz-docker: Docker Compose project for the MusicBrainz Server with replication, search, and development setup

If you really only want the core data itself, there are data dumps available here:
https://musicbrainz.org/doc/MusicBrainz_Database/Download

jho88 · June 9, 2022, 9:34am

Hi, thanks for your quick reply.

Just tried the docker approach.

Before doing so I expanded my VM disk size to 50 GB, just in case.

I then ran: sudo docker-compose run --rm musicbrainz createdb.sh -fetch

It spent about two hours downloading the db and setting it up but finally failed with an error:

InitDb.pl failed

I had another error popup saying I was out of disk space on a particular partition. I stupidly closed the dialog before getting full details, again tried to run:

sudo docker-compose run --rm musicbrainz createdb.sh -fetch

…to reproduce the error, but it starts downloading the 4GB again. It’s not practical for me to go through a 2 hour, 4GB download cycle for every troubleshooting iteration, so I cancelled the second operation.

Is there anyway for me to get a command which recognises that the 4GB has already been downloaded and resume from where it left off? Has the fact that I have already ran the command twice meant that the original file has already been overwritten?

By the way, I also tried running the command without the fetch:
sudo docker-compose run --rm musicbrainz createdb.sh

and got:
Creating musicbrainz-docker_musicbrainz_run … done
found existing dumps
2022/06/09 09:25:04 Waiting for: tcp://db:5432
2022/06/09 09:25:04 Connected to tcp://db:5432
2022/06/09 09:25:04 Command finished successfully.
Thu Jun 9 09:25:04 2022 : InitDb.pl starting
ERROR: schema “musicbrainz” already exists
ERROR: schema “cover_art_archive” already exists
ERROR: schema “documentation” already exists
ERROR: schema “event_art_archive” already exists
ERROR: schema “json_dump” already exists
ERROR: schema “report” already exists
ERROR: schema “sitemaps” already exists
ERROR: schema “statistics” already exists
ERROR: schema “wikidocs” already exists
ERROR: schema “dbmirror2” already exists
Thu Jun 9 09:25:04 2022 : Installing extensions (Extensions.sql)
Thu Jun 9 09:25:04 2022 : Creating collations … (CreateCollations.sql)
Thu Jun 9 09:25:04 2022 : psql:/musicbrainz-server/admin/sql/CreateCollations.sql:10: ERROR: collation “musicbrainz” already exists
Error during CreateCollations.sql at /musicbrainz-server/admin/InitDb.pl line 93.
Thu Jun 9 09:25:04 2022 : InitDb.pl failed
ERROR: 3

Any thoughts?

jho88 · June 9, 2022, 9:36am

By the way, this was my original error:

series_alias_type 2 100% 1748 0.00 sec
No data file found for ‘series_attribute’, skipping
No data file found for ‘series_attribute_type’, skipping
No data file found for ‘series_attribute_type_allowed_value’, skipping
Thu Jun 9 09:14:34 2022 : load series_gid_redirect
series_gid_redirect 128 100% 64321 0.00 sec
Thu Jun 9 09:14:34 2022 : load series_ordering_type
series_ordering_type 2 100% 1828 0.00 sec
Thu Jun 9 09:14:34 2022 : load series_type
series_type 13 100% 12487 0.00 sec
Thu Jun 9 09:14:34 2022 : load track
track 14127104 36% 398361Error loading /media/dbdump/tmp/MBImport-GlHqmEGB/mbdump/track: 08000 DBD::Pg::db pg_putcopydata failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request. at /musicbrainz-server/admin/MBImport.pl line 304, line 14127664.
08006 DBI connect(‘dbname=musicbrainz_db;host=db;port=5432’,‘musicbrainz’,…) failed: connection to server at “db” (172.18.0.3), port 5432 failed: FATAL: the database system is in recovery mode (in cleanup) �lM�U�5I�U at /root/perl5/lib/perl5/Throwable.pm line 76, line 14127664 during global destruction.

Failed to import dataset.
Thu Jun 9 09:15:10 2022 : InitDb.pl failed
ERROR: 255

InvisibleMan78 · June 9, 2022, 9:45am

I can’t tell you exactly what’s wrong.

The instructions works fine if you follow them point by point (I use them with every schema upgrade, starting from scratch). I use VMWare with totally 300GB hard disc space as suggested in the instructions (your 50GB are much too small, even without indexed search), install latest Ubuntu on it and then start with installing Docker as mentioned as “Required software”:

sudo apt-get update && \
sudo apt-get install docker.io docker-compose git && \
sudo systemctl enable --now docker.service

For your question about not to download the dumps again if an error occured:

yvanzo · June 9, 2022, 9:57am

If you know that the dumps have been successfully downloaded already, don’t use the flag -fetch again.

If the previous attempt of creating the database has been interrupted, you can try the script recreatedb.sh instead (which just drop the database before running createdb.sh):

sudo docker-compose run --rm musicbrainz recreatedb.sh

Also, almost every script has documentation that can be read using --help flag on the command-line.

jho88 · June 9, 2022, 10:04am

Thanks, I tried this but because I ran with fetch again, it seems the files were deleted and I must start from scratch again.

I did however download the mbdump.tar.bz2 separately. Is there any way I can copy this into a folder so that the command knows it is there and can use it directly?

yvanzo · June 9, 2022, 10:22am

The script createdb.sh would fail because it is calling InitDb.pl with some other dump files too (see the list of downloaded files). It most likely explains the first failure: Your disk ran out of space, which interrupted the download, and InitDb.pl got called with broken files.

Downloaded files are stored in a docker volume which usually is on your filesystem:

sudo docker volume inspect --format '{{.Mountpoint}}' musicbrainz-docker_dbdump

If you manually download all of these files to this directory, then you should be able to recreate the database.

Edit: The Docker volume name is dbdump (not pgdata).

jho88 · June 10, 2022, 4:46am

Hi, thanks for your reply.

I copied the downloads into musicbrainz-docker_dbdump location but forgot one file.

The script threw an error.

After then copying the missing file and re-running the command I get the error below.

Is there anyway for me to re-run the command without having to uninstall docker, or recreating my entire VM?

g@ubuntu:~/musicbrainz-docker$ sudo docker-compose run --rm musicbrainz createdb.sh
Creating musicbrainz-docker_musicbrainz_run … done
\found existing dumps
2022/06/10 04:44:02 Waiting for: tcp://db:5432
2022/06/10 04:44:02 Connected to tcp://db:5432
2022/06/10 04:44:02 Command finished successfully.
Fri Jun 10 04:44:02 2022 : InitDb.pl starting
ERROR: schema “musicbrainz” already exists
ERROR: schema “cover_art_archive” already exists
ERROR: schema “documentation” already exists
ERROR: schema “event_art_archive” already exists
ERROR: schema “json_dump” already exists
ERROR: schema “report” already exists
ERROR: schema “sitemaps” already exists
ERROR: schema “statistics” already exists
ERROR: schema “wikidocs” already exists
ERROR: schema “dbmirror2” already exists
Fri Jun 10 04:44:03 2022 : Installing extensions (Extensions.sql)
Fri Jun 10 04:44:03 2022 : Creating collations … (CreateCollations.sql)
Fri Jun 10 04:44:03 2022 : psql:/musicbrainz-server/admin/sql/CreateCollations.sql:10: ERROR: collation “musicbrainz” already exists
Error during CreateCollations.sql at /musicbrainz-server/admin/InitDb.pl line 93.
Fri Jun 10 04:44:03 2022 : InitDb.pl failed
ERROR: 3

jho88 · June 10, 2022, 6:12am

Hi, I reinstalled Ubuntu, and re-ran the commands with the downloaded files copied into the correct folder.

After about 30 min of execution time I got the error below. My files have been downloaded across a period of around 2 days. Would this mean they are out of sync?

Is there anyway I can correct the error without having to go back to scratch and recreate my entire VM again? This is a very time-consuming and data expensive process.

Please help if you are able! Thanks

Command:
sudo docker-compose run --rm musicbrainz createdb.sh

Error:
mbdump/medium_index
mbdump/place_annotation
mbdump/place_meta
mbdump/place_tag
mbdump/recording_annotation
mbdump/recording_meta
mbdump/recording_tag
mbdump/release_annotation
mbdump/release_group_annotation
mbdump/release_group_meta
mbdump/release_group_tag
mbdump/release_meta
mbdump/release_tag
mbdump/series_annotation
mbdump/series_tag
mbdump/tag
mbdump/work_annotation
mbdump/work_meta
mbdump/work_tag
Fri Jun 10 06:03:59 2022 : tar -C /media/dbdump/tmp/MBImport-8s3bt4Gf --bzip2 -xvf mbdump-stats.tar.bz2
TIMESTAMP
COPYING
README
REPLICATION_SEQUENCE
SCHEMA_SEQUENCE
mbdump/statistics.statistic
mbdump/statistics.statistic_event
Fri Jun 10 06:04:16 2022 : tar -C /media/dbdump/tmp/MBImport-D7pSsTev --bzip2 -xvf mbdump-wikidocs.tar.bz2
TIMESTAMP
COPYING
README
REPLICATION_SEQUENCE
SCHEMA_SEQUENCE
mbdump/wikidocs.wikidocs_index
Fri Jun 10 06:04:16 2022 : Validating snapshot
Fri Jun 10 06:04:16 2022 : Aborting import - your TIMESTAMP files don’t match!
Fri Jun 10 06:04:16 2022 : The different TIMESTAMP files follow:
2022-06-04 00:19:50.552389+00
2022-06-08 00:19:08.042802+00

Failed to import dataset.
Fri Jun 10 06:04:17 2022 : InitDb.pl failed
ERROR: 1

InvisibleMan78 · June 10, 2022, 6:13am

Most virtual machines support some kind of “snapshots”. A snapshot preserves the state and data of a virtual machine at a specific point in time.
For example: You create a new VM, install Ubuntu with all updates and install docker as mentioned above.
Then you create a snapshot.
After the creation of a snapshot you continue with the instructions. If an error occurs, you can go back to your previously snapshot and repeat the steps since this point without re-creating the entire VM.
Of course you can create multiple snapshots at different points in time for your VM.

InvisibleMan78 · June 10, 2022, 6:19am

I assume, that you currently use a non-matching file called
LATEST
This file let the replication know what files from which date you are using.
http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/

jho88 · June 10, 2022, 6:21am

I see, so as long as I download all my files from the same timestamped folder I should be ok?

Thanks for the advice on snapshots too. I will take note.

InvisibleMan78 · June 10, 2022, 6:25am

I’m afraid that’s not enough. AFAIK, the process will check the newest LATEST file online from your choosen source server and compare it to your manually downloaded files. I’m not sure if there is a way to use “old” local dump files anyway, if newer dump files are available online. Maybe @yvanzo can tell it for sure.

yvanzo · June 10, 2022, 1:33pm

Yes, see the section “Recreate database”.

If it doesn’t work for you, you can stop and remove all containers and volumes (including the database and downloaded dumps), run the following command from the directory musicbrainz-docker:

sudo docker-compose down --volumes

yvanzo · June 10, 2022, 1:37pm

It is expected to abort when the timestamps differ, because loading dumps made at different time would cause the database to be broken in several ways (foreign key constraints, and replication packets).

yvanzo · June 10, 2022, 1:40pm

No, the file LATEST is used when fetching dumps only, not when creating the database.

The “TIMESTAMP files” are extracted from the dump files.

Yes.

jho88 · June 11, 2022, 5:00am

Thanks guys, the createdb.sh has finally seemed to complete successfully:

sudo docker-compose run --rm musicbrainz createdb.sh -fetch

Excuse my ignorance, but I know nothing about docker, and very little about Ubuntu. Can I now run queries on the database?

If so, how can this be done? Can a select statement be run from the command line? What is the command line statement format to run queries?

Are there any GUIs that I can install on top of the database or docker to make things easier to use? (Btw, at this stage I don’t need any indexes created.)

InvisibleMan78 · June 11, 2022, 9:13am

I suggest to use one of the many Windows tools (for example DBeaver) to query the database.
To access the database (running inside docker) you have to publish the ports according to

yvanzo · June 11, 2022, 10:00am

You can check if the Docker container for the database is running with:

# in musicbrainz-docker directory
sudo docker-compose ps --all db

By default, it exposes the port 5432 to all IP v4 network interfaces (0.0.0.0). It implies that anyone on the same network can access it.

If you want to expose the database to the remote Ubuntu host only, then run the following commands:

# in musicbrainz-docker directory
echo MUSICBRAINZ_DOCKER_HOST_IPADDRCOL=127.0.0.1: >> .env
sudo docker-compose up -d db

Then you can access the database with any tool installed on the remote Ubuntu host, and you can still access it from your local workstation by setting up an SSH tunnel.