I figured I should be a bit more public and responsive if I’m going to continue editing, and perhaps explaining more of what I’m doing will help why I make some of the mistakes I do.
I have a handful of apps that I use which depend on MusicBrainz data to work properly. Over time, pressure has grown on these apps to support using Discogs instead of MusicBrainz because of perceived better quality of data, especially missing releases for electronic artists who publish a lot on SoundCloud. In my opinion, that is a bad pivot and is going to lead to (is already leading to) some instability in the projects. I’m not against Discogs at all and there is some use for aggregating services, but there are also a lot of advantages to improving an existing ecosystem.
The project I’m working on takes a list of artists and tries to do a range of audits on their Digital Media with as much automation as I can without making huge errors. Automation is supported by n8n and a range of userscripts (some “controllers”, some “workers”, and some “action panels”). Data is supported by a local MusicBrainz mirror and a local Harmony server with an API I built on top.
- try to identify releases that have links but are missing barcodes. I use those links to pull release info from Harmony and then use that to help fill in the gaps.
- try to identify releases with links that have conflicting barcodes. I’m still working on improving automation here as it’s more tedious.
- identify releases with missing ISRCs and a valid barcode and link. Send those to Harmony Actions to import ISRCs.
- audit the recordings page and merge together matching recordings based, as much as possible, on the filled ISRCs.
- use any existing links to fill missing cover art
- Scan Deezer and Spotify via API for releases that do not exist in Musicbrainz. Send these releases to Harmony and start an import flow. This can be done by n8n workflows or by in-browser userscripts. Beatport and SoundCloud will be up next after that, but Spotify and Deezer APIs are just easier to work with.
As I’ve gone through this process, I’ve made incremental improvements and learned a lot more about how MusicBrainz “works”. I still have a lot of gaps in my understanding and processes but I appreciate those who have helped me along, particularly afrocat and chaban.
The biggest snags in my processes:
- trusting existing data too much - if existing links or barcodes or artist attributions are wrong and Harmony isn’t able to detect that, I will often kick off a flow that makes some bad assumptions
- artist collision - I’m working on a flow in n8n that I think will improve artist collision but after a few weeks of looking at it, I see that this is really a problem the music industry as a whole is struggling with
- MB rate-limiting - My best run so far was about 21,000 edits in 18 hours and I hope to go much further than that, but I spend a lot of time just solving which requests route where (I have a “MusicBrainz gateway” I created to help do some of the routing decisions).
- flooding review/approval queues - if i introduce errors, it may be much harder for other editors to catch them unless they subscribe to a fairly narrow range of entities, and even then, it is hard to review 100s of changes at a time. Whlie this is kind of backwards, I hope to turn to the edit approval processes once I have more of the automated adding working, so that I can then spend all my human time on reviewing the information being added or edited.
If you notice other patterns of mistakes, you should see in my comments that I’m working on those, but I’m very open to feedback and hope to learn more. If there is something you struggle with, we might be able to collaborate on that. I’m working through understanding github better but I intend to contribute to the existing projects and userscripts so we can all benefit if you find my flows helpful.