It’s great to see more academics using the MusicBrainz data, there is definitely a lot of interesting data to do things with.
It looks like you’ve run into one of the first issues people tend to encounter when they first look at MusicBrainz - the structure of the database is quite complex. This is because (as I’m sure you’re aware) recorded music is itself complex. Some of your queries are easy to calculate (e.g. all artists). Others are a bit more complex.
It’s not clear here if you want just a list of all album names in musicbrainz (easy), or that same list linked with the album’s artist, which is a bit more difficult
- albums can have more than 1 artist, so “just putting it in csv” requires a bit of forethought
- if an artist appears on just one track on an album do you want it to appear as an album artist?
- many albums have their artist listed as “Various artists” or “Soundtrack”. How do you want to deal with these ones?
- Do you want only officially released albums/singles/eps, or also promo material, compilations, live bootlegs, etc?
There is another series of similar questions for songs. This is why we recommended first trying to understand the data model or webservice of musicbrainz a little more. This will help you more specifically formulate your questions, which will help us to better answer them.
I’m not sure what tools and programming languages you are familiar with. If you use python, there is a great library called mbdata which abstracts away much of the SQL from the musicbrainz database and let’s you deal with more abstract concepts. Here is an example of using mbdata to get artists from the database and write them to a csv file:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from mbdata.models import Artist
engine = create_engine('postgresql://musicbrainz:email@example.com/musicbrainz')
Session = sessionmaker(bind=engine)
session = Session()
artists = session.query(Artist).limit(100).all()
with open("some-artists.csv", "w") as fp:
writer = csv.writer(fp)
for a in artists:
As an alternative, we provide dumps of the musicbrainz database in JSON format, which might be easier for you to get your head around: ftp://ftp.musicbrainz.org/pub/musicbrainz/data/json-dumps/20170610-001001/
I had a look at the artist dump - it is 370MB compressed but uncompresses to about 4GB. I use a program called jq to play with the json data:
head -100 artist | jq '[.id, .name] | @csv'
this gets the first 100 lines of the file
artist then uses jq to get the .id and .name fields, and output them as a csv format.
Hope this is useful!