Mbdump.tar.bz2 Artist List

Tags: #<Tag:0x00007fef63df1380>

I’ll preface this by saying I’m pretty new to this website and utilizing it as effectively as I can, however when looking through mbdump.tar.bz2 there doesn’t seem to be a file dedicated to a list of artists and their associated ID’s, looking through track_raw there seems to be all the song titles, with some of the artists associated with the songs, however for a large majority the artist is left as blank. The way I’m looking at it makes it seem as though each song has an artist ID number, for example band x would be ID # 123123, which is helpful since all songs from band x seem to be consistently labeled as ID # 123123, however is there a way to associate all these ID #'s with an artist written out in plain text? It would be helpful if there was a script or raw file that could do this/already had it done.

If I didn’t explain this clearly or need to add more detail please let me know and I’ll add in whatever I can, thank you!

Edit: Alternatively, if there is a thread that discusses this, or a database/file that already combines all the musicbrainz songs organized by the associated artist, that would be helpful too.

1 Like

Welcome @DFlo195

Please have a look here:
https://beta.musicbrainz.org/doc/MusicBrainz_Database

and then you could follow
https://beta.musicbrainz.org/doc/MusicBrainz_Database/Download

Choose your preferred method to get a local database up and running.

1 Like

Hello thank you for the quick response. So I’ve taken a look through those and the only amounts of data I’d really need are simply just every song & their associated artists, so I figured mbDump.tar.bz2 would be the right way to go in that download page, since the description is “This is the core MusicBrainz database, including the tables for Artist, Release, Recording, etc.” However, when I looked through the files in those downloads it seemed to be an incomplete list of artists, when I looked through the track_raw file (one of the 3 files found within the mbdump folder) a small percentage actually had the artists (written in plain-text) listed next to the song titles, so I’m wondering if there’s something else I’d need to do to associate all those artists with their songs, or if there is already a file that contains that information.

I would like to avoid downloading the full MusicBrainz database if I have to as it would contain a lot of superfluous information that I wouldn’t need. Has someone created a database that is as simple as every Artist & their associated songs?

Maybe this schema can help you to connect the “songs” to the artists:

2 Likes

Hi @DFlo195,

when looking through mbdump.tar.bz2 there doesn’t seem to be a file dedicated to a list of artists and their associated ID’s

No, mbdump.tar.bz2 contains all artists and their MBIDs. This file is made available to the purpose of loading it into a PostgreSQL database through MusicBrainz Server or mbslave.

Alternatively, if you are interested in artists only, you may prefer to download JSON dumps instead, as these have a separate file for artist (and other entity types).

Notes:

  • Sequential IDs (id) are for internal use, MBIDs (gid) are more reliable, see intro to MBID;
  • In MusicBrainz terminology, “song” is a work type, maybe you mean a recording instead?
  • It would be easier to answer appropriately if you explain what you want to achieve first.
2 Likes

Thank you so much for the clarification on the terminology I was incorrectly using, and for the helpful response in general. I’m going to try to look through the website a little more so I can be more specific with what I’m asking going forward. However, just to answer that last question basically what I’m looking to do is to create a SQL database that contains every artist, and their original recording’s that are associated with that artist. excluding remixes, remasters, live versions, etc. Likewise album titles, cover art, song lengths, audio clips, anything that isn’t an artist or one of their original recordings is all irrelevant information, at least for what I’m trying to achieve.

1 Like

If you want my completely not‐asked for opinion, I think you should just set up a mirror of the MusicBrainz database, even if you’re not going to use 90+% of the data in it (right now). This would allow you to 1) set up replication (either now or at a later point) to keep the data updated without having to jump through a bunch of hoops yourself, 2) if you ever decide that you do want album names or maybe group or family relationships or label information, it’s already there. All in all, it’s just the more future proof approach.

I’d recommend putting your own data(base) in the same PostgreSQL cluster (maybe even as a schema in the same database that the MB data is in) so you can directly JOIN and do other operations with the MusicBrainz data and your own data in the SQL instructions.

(This is essentially what projects such as AcoustID and AcousticBrainz and CritiqueBrainz and many other sites/projects that expand on MusicBrainz’s data do.)

4 Likes

Yeah the more I look through everything it seems like setting up a mirror and modifying it to my needs would ultimately be less work than trying to start with the dumps from scratch and get what I want out of the database. I’ll probably try and give that a go, thank you for the input!

Hello, out of curiosity is it possible the JSON files are incomplete with the rest of the database? I had downloaded the recording.tar.xz, unpacked it in Linux and ran a script that took only the artists & songs from each line and put it into a MySQL database, looking through it afterwards it only had 28,831 songs. Admittedly I haven’t ran through every file on there yet (release-group looks to be the next most promising file) but I’m wondering if I need to either a) download multiple files and combine them to get all the data on strictly recordings (based on the musicbrainz terminology, recordings is what I’m looking for, but in general I just need all the songs from each band), or b) the JSON dumps are incomplete and don’t contain all the data that the musicbrainz website has.

Edit: I’m almost wondering if maybe going back to the Mbdump.tar.bz2 would be a better way to go, as the reason I had issues the first time was because the file was too large to open up in any of the software I had on Windows, however with Linux I can avoid that issue, and run the script that pulls strictly artists & songs from the file, hopefully that would contain the same information that’s found within the MusicBrainz website.

recording.tar.xz contains standalone recordings only. You would probably want to combine it with release.tar.xz which contains releases including tracks/released recordings’ metadata.