mbDump.tar.xz - Associating artists with Recording

Tags: #<Tag:0x00007fef651ea238>

Hello, within the recording’s file in mbDump.tar.xz is there anything that relates each recording to the artist? If so, is it another file within mbdump.tar.xz? Essentially, how do I connect the recording’s with an artist. There are multiple ID #'s, some are 32-character codes which I believe are the MBID’s, some are 6-digit codes, etc. Is there anything that relates each recording to a specific artist? If so, what is the file or what is the ID that I should be using to connect them. The goal is to create a database that contains simply just artists & their recordings across all albums.

Did you have a look at the information in the database schema that @InvisibleMan78 provided in Mbdump.tar.bz2 Artist List? As @Freso suggested in the same thread, it is likely best if you set up set up a mirror of the MusicBrainz database. That would likely be the simplest solution to obtaining the information that you’re seeking.


So looking at this part of the schema the Foreign Key artist_credit should be in every entry in recording right? However I’m just having difficulty figuring out what that link is in recording, what ID is linking recording to artist_credit so I could pull the artist’s name out of every recording. As I was saying before I was noticing no similar ID’s between any of the recordings (under the same artist), so I’m just trying to find the common link because I can’t seem to see anything that links an artist to all of their recordings. And I am sorry if I’m not being as specific as I can or if I’m coming across as rude, this communities been incredibly helpful so far in helping me work out this issue (seriously you guys are great).

If you know regular expressions, take the ID that has this format (we call it the MBID): [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} and do not use the integer ID (it’s not fixed to 6 digits), it changes with every merges.

So I should be able to link the MBID between artist_credit and recording and link up all the recordings to their correct artist’s?

Sorry now I see we are about MB database stuff. That’s beyond my knowledge. :confused:

1 Like

Hey don’t worry man it’s vastly beyond my knowledge too!:sweat_smile:

1 Like

First off, I need to apologize for the tone of my earlier message. I was in a hurry when I wrote it, and I should know better. I didn’t mean to be condescending or imply that you hadn’t looked at the information. In fact, it has been some time since I reviewed the schema information myself, and I forgot how confusing it can be.

I think I’m going to have to take my own advice and set up a local copy of the MusicBrainz database so that I can provide more information and better advice for these sort of questions. Until then, I think you’re on the right track.


Don’t worry you didn’t come off that way at all. And though I greatly appreciate it don’t feel the need to go through that much trouble unless you truly want to, I’m gonna take a more dedicated look at the artist_credit file and see if I can find something that would be helpful, it’s entirely possible I’ve been looking in the wrong places for what I want, and a fresh look at that schema may have been exactly what I needed.

1 Like

I’ve actually been meaning to do it for some time now, but never found the time. I think I need to make the time. My hope is that, by examining the data structures directly there may be more information than that shown in the schema diagram. Armed with that information, I should be able to provide better help in the future when database questions arise.

1 Like

I don’t know if this helps at all, but the actual SQL code to create the database is available on GitHub at https://github.com/metabrainz/musicbrainz-server/tree/master/admin/sql

You might find something useful by examining the code for creating the tables and foreign keys. I’m going through it now, so I’ll let you know if I have any revelations.

Ooh okay awesome I’ll definitely try and take a look through that

I ended up figuring it out, turns out I need the artist_credit id, which is present in the recordings after the name of the song typically, there can sometimes be duplicates, and cover bands and artist_credit’s where there’s multiple bands/feat./vs. can make things a little more confusing and I think that’s where I got tripped up, gonna try to make a script that pulls the recording name & artist_credit name from each database and inputs them into a SQL DB.

1 Like

Sounds like you have it sorted out. That’s great.

I think that the SQL query to get the list of recordings for an artist from the MB database would be something like:

SELECT recording.* FROM artist LEFT JOIN artist_credit_name ON artist.id=artist_credit_name.artist LEFT JOIN artist_credit ON artist_credit_name.artist_credit=artist_credit.id LEFT JOIN recording ON artist_credit.id=recording.artist_credit WHERE artist.gid=?

I asked the developers on IRC to confirm that’s correct (or provide an alternative), but never got a response. When I finally get my local copy of the database set up, I’ll give it a try.