Question about MB tables and their relationships

Tags: #<Tag:0x00007f9adf56ef10>

This database overwhelms me, I am impressed by the wealth of info and by this new tool (love learning new things.)

That said, I set up a MB server on VirtualBox and need some help with deciphering the sea of data.
I want to create a data extract that contains all this info: Artist, Title, AlbumArtist, Album, Year and Genre, to begin with.
What tables do I need to use, under what schemas (presumably all under MB?), and what are the relationships (links) between them?

If somebody already has a Sql query to pull these details, please let me know.

Below I posted a picture of my connection in pgAdmin, so you know what I mean. Thanks much.

1 Like

https://wiki.musicbrainz.org/MusicBrainz_Database

6 Likes

Also see the wiki: https://wiki.musicbrainz.org/MusicBrainz_Database/Schema

6 Likes

That diagram is not straightforward at all.

No matter how long I look I can’t find album, genre, year. Help? Can someone dumb that down for me?

In the MusicBrainz world, “album” is actually “release” (and there is a “release” table), and the year should be part of the “release event” (found in the corresponding “release_country” or “release_unknown_country” table). I’m not sure about the genres because I’ve never used them.

1 Like

Thank you, that helps.

See, this only shows that we don’t know what we don’t know, I thought these info were missing.

Now, the goddamn server is erroring out on me, Murphy’s law, just when I was about to run a test query, this pest fails on me:

SELECT
release_group.gid AS albumid,
release_group.type AS albumPrimaryTypeId,
release_group_primary_type.name as albumPrimaryTypeName,
release_name.name AS albumName,
artist_name.name AS artistName,
artist.gid AS artistGid,
artist.type as artisTypeId,
artist_type.name as artistTypeName,
artist.begin_date_year artistBeginDateYear,
gender.name as artistGender,
area.name as artistCountryName,
artist_meta.rating artistRating,
artist_meta.rating_count artistRatingCount,
release_group_meta.first_release_date_year albumYear,
release_group_meta.rating albumRating,
release_group_meta.rating_count albumRatingCount
FROM
artist
INNER JOIN artist_credit_name
ON artist_credit_name.artist = artist.id
INNER JOIN artist_credit
ON artist_credit.id = artist_credit_name.artist_credit
INNER JOIN release_group
ON release_group.artist_credit = artist_credit.id
INNER JOIN release_name
ON release_name.id = release_group.name
INNER JOIN artist_name
ON artist.name = artist_name.id
INNER JOIN artist_type
ON artist.type = artist_type.id
INNER JOIN area
ON artist.area = area.id
INNER JOIN release_group_primary_type
ON release_group_primary_type.id = release_group.type
LEFT OUTER JOIN release_group_secondary_type_join
ON release_group_secondary_type_join.release_group = release_group.id
LEFT OUTER JOIN gender
ON artist.gender = gender.id
LEFT OUTER JOIN artist_meta
ON artist.id = artist_meta.id
LEFT OUTER JOIN release_group_meta
ON release_group_meta.id = release_group.id
WHERE
release_group.type = ‘1’
AND release_group_secondary_type_join.secondary_type IS NULL
AND upper(artist_name.name) = ‘U2’;

1 Like

Sorry, I can’t help you there. I’ve never gotten around to setting up a local copy of the database to experiment with. Keep meaning to, but I always seem to end up getting sidetracked on something else. One of the devs should be able to help, but they’re all at the MetaBrainz Summit conference this weekend. With any luck, one of them will stop by and point you in the right direction.

1 Like

I’m trying to restart everything, to see if it helps. I did all the steps as yesterday, but today this pest decided to give me a hard time.
My goal is to create a single table that I can export to SAS and play with it from there.

Just after I wrote this, it went back to working. For that, I had to shut down and rerun the VM.
See why it upsets me, because sometimes these errors seem to be totally random, at their whim.

I’m guessing that either @Bitmap or @yvanzo would be the devs that are most likely to be able to help. Guys, if you read this, also have a look at:

You can also usually catch them on IRC (#metabrainz on freenode.net).

1 Like

I was able to fix that issue, now it’s my query that’s failing.

I mean, it’s not my query, I borrowed it from somer user.

It’s complaining that table release_name doesn’t exist. Maybe this query is out of date, and that table was replaced.

Any idea how to fix the query, maybe it just needs that table to be replaced with its current equivalent.

SELECT
release_group.gid AS albumid,
release_group.type AS albumPrimaryTypeId,
release_group_primary_type.name as albumPrimaryTypeName,
release_name.name AS albumName,
artist_name.name AS artistName,
artist.gid AS artistGid,
artist.type as artisTypeId,
artist_type.name as artistTypeName,
artist.begin_date_year artistBeginDateYear,
gender.name as artistGender,
area.name as artistCountryName,
artist_meta.rating artistRating,
artist_meta.rating_count artistRatingCount,
release_group_meta.first_release_date_year albumYear,
release_group_meta.rating albumRating,
release_group_meta.rating_count albumRatingCount
FROM
artist
INNER JOIN artist_credit_name
ON artist_credit_name.artist = artist.id
INNER JOIN artist_credit
ON artist_credit.id = artist_credit_name.artist_credit
INNER JOIN release_group
ON release_group.artist_credit = artist_credit.id
INNER JOIN release_name
ON release_name.id = release_group.name
INNER JOIN artist_name
ON artist.name = artist_name.id
INNER JOIN artist_type
ON artist.type = artist_type.id
INNER JOIN area
ON artist.area = area.id
INNER JOIN release_group_primary_type
ON release_group_primary_type.id = release_group.type
LEFT OUTER JOIN release_group_secondary_type_join
ON release_group_secondary_type_join.release_group = release_group.id
LEFT OUTER JOIN gender
ON artist.gender = gender.id
LEFT OUTER JOIN artist_meta
ON artist.id = artist_meta.id
LEFT OUTER JOIN release_group_meta
ON release_group_meta.id = release_group.id
WHERE
release_group.type = ‘1’
AND release_group_secondary_type_join.secondary_type IS NULL
AND upper(artist_name.name) = ‘U2’;

1 Like

Try just “release.name” rather than “release_name.name”. I believe that you can get the name directly from the release table rather than trying to join the non-existent “release_name” table.

Yes, that’s what I’m doing now, fixing the query replacing non existent tables with their possible current name.

However, even the links here seem to be wrong, I will have to fix the whole thing.

1 Like

I was thinking that as I started looking a bit more closely. In any event, I think you’re on the right track now with a little better understanding of the schema and where you might find the information you want to extract. Good luck with it.

1 Like

Thanks, the query now runs, but maybe it’s not 100% correct.
I’m not seeing all the U2 albums in this example.

SELECT
artist.name AS Artist
,artist.gid AS artistGid
,artist.type as artisTypeId
,artist_type.name as ArtistTypeName
,release_group_meta.first_release_date_year as Year
,release.name AS Album
,artist.begin_date_year ArtistBeginDateYear
,area.name as ArtistCountryName
,release_group.gid AS albumid
,release_group.type AS albumPrimaryTypeId
,release_group_primary_type.name as albumPrimaryTypeName
FROM artist
INNER JOIN artist_credit_name ON artist_credit_name.artist = artist.id
INNER JOIN artist_credit ON artist_credit.id = artist_credit_name.artist_credit
INNER JOIN release_group ON release_group.artist_credit = artist_credit.id
INNER JOIN release ON release.id = release_group.id
INNER JOIN artist_type ON artist.type = artist_type.id
INNER JOIN area ON artist.area = area.id
INNER JOIN release_group_primary_type ON release_group_primary_type.id = release_group.type
LEFT OUTER JOIN release_group_secondary_type_join ON release_group_secondary_type_join.release_group = release_group.id
LEFT OUTER JOIN release_group_meta ON release_group_meta.id = release_group.id
WHERE
release_group.type = ‘1’
AND release_group_secondary_type_join.secondary_type IS NULL
AND upper(artist.name) = ‘U2’;

1 Like

Try removing the condition:

AND release_group_secondary_type_join.secondary_type IS NULL

In some cases, a release may have secondary types of “collection” (e.g.: “Best of”) or “live”, which your current query will exclude because you’re only looking for releases with no secondary types.

2 Likes

I did, it’s not that. Removing the condition gives me ~920 rows, but none of the famous albums of U2 - joshua tree, with or without you, etc.

I think one/some of the joins are wrong. Maybe I can try making some of the inner joins left joins.

3 Likes

So, the issue was much more simple: you were joining release on release.id = release_group.id, meaning you were just getting releases whose row IDs in the release table happened to match the U2 release group row IDs. Judging from what fields you’re actually selecting, you don’t seem to need the releases at all, so I just got rid of that join and changed it to return the release group name instead.

    SELECT
    artist.name AS Artist
    ,artist.gid AS artistGid
    ,artist.type as artisTypeId
    ,artist_type.name as ArtistTypeName
    ,release_group_meta.first_release_date_year as Year
    ,release_group.name AS Album
    ,artist.begin_date_year ArtistBeginDateYear
    ,area.name as ArtistCountryName
    ,release_group.gid AS albumid
    ,release_group.type AS albumPrimaryTypeId
    ,release_group_primary_type.name as albumPrimaryTypeName
    FROM artist
    INNER JOIN artist_credit_name ON artist_credit_name.artist = artist.id
    INNER JOIN artist_credit ON artist_credit.id = artist_credit_name.artist_credit
    INNER JOIN release_group ON release_group.artist_credit = artist_credit.id
    INNER JOIN release_group_primary_type ON release_group_primary_type.id = release_group.type
    LEFT OUTER JOIN artist_type ON artist.type = artist_type.id
    LEFT OUTER JOIN area ON artist.area = area.id
    LEFT OUTER JOIN release_group_meta ON release_group_meta.id = release_group.id
    WHERE
    release_group.type = '1'
    AND NOT EXISTS (SELECT 1 FROM release_group_secondary_type_join WHERE release_group = release_group.id)
    AND artist.gid = 'a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432';

I also made it so that area and type are optional, because while U2 has them, other artists might not. I also make it use artist.gid instead of name, because there’s also a Japanese band named U2 you probably don’t want.

5 Likes

Perfect. Thanks much.

Btw, Nicolas, if I want to add track names and complete artist (such as, with feat.'s, e.g. Madonna feat. Iggy Azalea), how would I do that? Also, genre is a nice to have.

It seems nobody will spoon feed this to me. Guess I’ll have to read. :frowning:

Ta-da, I finally I was able to come up with this query. It should be right.

Yes, this query runs and is producing circa 24k rows, just for U2. It’s kinda slow to run, I wonder what’s gonna happen when I run it for the whole thing.

SELECT
artist.Id as ArtId
,artist.name as Art
,artist.begin_date_year as ArtBegYear
,area.name as ArtCountry
,artist.type as ArtTypeId
,artist_type.Name as ArtTypeName

,artist_credit_name.artist_credit as ArtCredNameId
,artist_credit_name.Name as ArtCredNameName
,artist_credit.Name as ArtCredName
,artist_credit.artist_count as ArtCredArtCount

,release_group.Id as AlbumGroupId
,release_group.Name as AlbumGroupName
,release_group.artist_credit as AlbumGroupArtCred
,release_group_primary_type.name as AlbumGroupPrimTypeName
,release_group_meta.first_release_date_year as AlbumGroupFirstYear

,release.id as AlbumId
,release.Name as AlbumName
,release.Quality as AlbumDQ
,area2.Name as AlbumCountry
,language.Name as AlbumLang

,medium.Id as MediuId
,medium.Name as MediuName
,medium.track_count as MediuTracks
,medium_format.Name as MediuFmtName
,medium_format.Year as MediuFmtYear

,track.Id as TrackId
,track.Position as TrackPos
,track.Number as TrackNo
,track.Name as TrackName
,track.length as TrackLen

from artist

inner join artist_credit_name ON artist_credit_name.artist = artist.id

inner join artist_credit ON artist_credit.id = artist_credit_name.artist_credit

inner join release_group ON release_group.artist_credit = artist_credit.id

inner join release on release.release_group = release_group.id
left join release_country on release_country.release = release.id
left join area as area2 ON area2.id = release_country.country
left join language on language.id = release.language

inner join medium on medium.release = release.id

left join medium_format on medium_format.id = medium.id

inner join track on track.medium = medium.id

left join release_group_primary_type on release_group_primary_type.id = release_group.type

left join area ON artist.area = area.id

left join artist_type ON artist_type.id = artist.type

left join release_group_meta ON release_group_meta.id = release_group.id

where artist.name = ‘U2’;

2 Likes