Whilst automating enrichment, standardisation and clean up of metadata that matters to me and directly affects how I get to interact with my music I’ve noticed some data issues I’m curious about.
To add mbid’s to my existing music collection I pulled a dump of the musicbrainz database about two weeks back and then pulled the mbid and artist name from the artists table.
Doing some basic housekeeping there are 19 entries in the artists table where a mbid has no associated name/text:
After eliminating namesakes there are 2,080,637 unique names in the artist table and there are 328,326 entries where a name appears more than once in the artists table - these are clearly namesakes, each with their own mbid, which is what enables the music server to differentiate one from another when presenting an artist’s discography and appearances in VA albums and albums where they’re not the albumartist.
Here’s another example of what looks like a data quality issue: c69b34a4-3082-4a2f-b063-bd6f177e025f Unwound: A Tribute to George Strait in the artists table. That’d seemingly be an album/release as opposed to an artist.
No MusicBrainz entities match the MBID c69b34a4-3082-4a2f-b063-bd6f177e025f. Either it’s incorrect, it was for an entity that has since been removed, or it is an ID for something else than an entity (for example, a relationship type).
And the the MBIDs from your screenshot all lead to artists starting with | (U+007C: VERTICAL LINE). That doesn’t look like a coincidence.
That’s no coincidence, I should’ve known better than to presume nobody is stupid enough to use a vertical line in their band name, let alone start with one. Wonder how many have started their names with or included U+0009 then.