Hello everybody
First of all , thanks you for this new forum.
My question is may be naive but I would like to know if it’s possible to get the whole database of musicbrainz in common format (csv,json …anyway) ? I would like the whole dataset to do an exercice of big data analysis.
great question!!
Did your project work? I’m currently working on a similar project. For this purpose it is essential to work with a csv file of the database. Is there any way to share your csv file?
Thank you for replying!
P.
I think we’re a few people looking for a CSV file we can import into other tools.
In my case I want to load Musicbrainz data into neo4j graph database.
If I’m able to get a CSV file I will post a link back here!
Sharing and Collaboration does not seem to be very frequent…
MusicBrainz has a web service which provides data as XML or JSON. With proper queries you can gather the data you’re interested in and convert it to whatever format you need.
Isn’t the MB database dump just a bunch of TSVs anyway? You can just load them into any decent CSV parser and tell it to use a tab instead of a comma as a delimiter.
Python:
import csv
with open('<some db dump file>') as f:
reader = csv.reader(f, delimiter='\t')
data = list(reader)
Thanks for the insight Zas,
Since I’ not a developer, extracting Musicbrainz data to XML or JSON means that I still have to code in order to ETL something.
Not lazy just not talented as you are
This not user friendly enough loll
Thanks for the suggestion!
Again, your idea means installing Python, use some code as the one you suggested…
Seems simple, I may give it a try.
Otherwise could I import Musicbrainz dump straight into a CSV parser?
Could you suggest one please?
Regards,
Simon
That’s a very simplistic view of a modern relational database consisting of millions of entities, and using gigabytes of data. That’s a bit like thinking a modern car is just a bunch of metal and plastic parts… and you can dissassemble then reassemble it without huge knowledge in mechanics… possible, but not “easy”.
Because you think it’s more complicated than trying to dump a whole complex database to simple CSVs. The thing is that you totally miss the meaning of “relational”. Even though each table could be exported as a CSV/TSV somehow (and even this isn’t that simple), those tables are linked together, I can tell you you’re on the wrong way if you think you can interact with MB data this way. At best, if someone is able to make it work, it will be extremely slow and prolly totally unusable.
Since you’re not a developer, you’d better hire one and focus on the use of the data you want to make.
Well, yes, each table could be exported then imported into a CSV parser, if you have enough resources you could even be able to do something with it, but, to be frank, that’s not something you want to do.
If you’re only interested in a handful of fields from one entity, then I think that such a simplistic view is totally acceptable. it’s completely viable to just load in the TSVs in that case, I’ve done it before. IMHO, loading all of the data into an actual RDBMS or using a bunch of requests to the MB API to eventually get the same data is much more complicated for that specific use case.
If you need data from multiple entities, then yes, I agree that it’s easier to import the data into an RDBMS and work with it from there, or use the API. In that case, it doesn’t even make much sense to talk about a single CSV file, as it would be heavily denormalised and barely usable.