Data dump missing songs from artists with special characters

Greetings! I love the MusicBrainz data dump. When I downloaded it, I noticed that most songs are missing from artists that have special characters (e.g. Guns ‘N’ Roses, A-Ha). Is this a known issue?

How are you using the dump? Are you importing it into PostgreSQL, and if so, what query are you running?

I’m just looking at a smaller sample dump, but I see Guns N’ Roses included in the artist table (ID 1440) and referenced in various places in the artist_credit_name table.

1 Like

I downloaded the dump. I saved the recording and artist files as .csv (comma delimited), then imported it into mySQL. I still have the original “recording” file from the dump (it was the Jan 28 dump). When I search the original file on Guns N Roses’ id (1440) it does nothing comes up in the column with the artist IDs. In the SQL file , “SELECT * from recordings WHERE artist_id = 1440” yields nothing.

Is artist_credit_name a better table to use? I would like artists’ names and titles of the songs they did, along with necessary IDs. Anything additional is a bonus.

Please let me know if my understanding/guessing is correct:

  • You downloaded a dump from one of the sources listed at https://musicbrainz.org/doc/MusicBrainz_Database/Download.
  • You converted the mbdump/artist and mbdump/recording files from their original format (which I think is TSV with null columns represented as \N) to CSV.
  • You created artist and recording tables in MySQL and imported the CSV data into them.

Is there a script somewhere that you used to import the dump? The data might be getting mangled somewhere in the process of getting it into MySQL.

Also, I’m not very knowledgeable about the MB database schema, but the recording table doesn’t link directly to the artist table (and doesn’t have an artist_id column), so I wouldn’t expect to see the 1440 Guns N’ Roses artist ID referenced directly in recording. See https://github.com/metabrainz/musicbrainz-server/blob/50a7a283cc3d62bedd8d391436f54c7e54ed0443/admin/sql/CreateTables.sql#L3230.

Instead, recording can be joined with the artist_credit table, which can then be joined with artist_credit_name. artist_credit_name.artist is where you’d see the 1440 ID corresponding to Guns N’ Roses.

So a query to get all recordings credited to Guns N’ Roses might might look something like this (untested):

SELECT ...
FROM recording
INNER JOIN artist_credit ON recording.artist_credit = artist_credit.id
INNER JOIN artist_credit_name ON artist_credit.id = artist_credit_name.artist_credit
WHERE artist_credit_name.artist = 1440
GROUP BY recording.id
3 Likes

Yes, those were the steps I took. I don’t recall the script or wizard I used for the conversion. But I opened the original database dump for recordings in Notepad++ and I could not find the Guns N Roses artist id in its id column.

Recordings don’t have a column with artist ID. Recordings are linked to an artist_credit, which is linked to one or more artist_credit_name, and that has an artist column containing the artist ID.

See also the schema at MusicBrainz Database / Schema - MusicBrainz and the SQL derat posted above.