Does anyone use the data quality system anymore?

There’s not a lot of documentation about it anymore, and some of the only ways I’ve seen it used in recent years are self-congratulatory “high” edits that are undeserved (data was deliberately left out) and retaliatory “low” edits as a bullying method against editors one doesn’t like. So I created MBS-14294 to abolish this outdated and often-abused system once and for all, but I would like to hear some backstory about it and/or opinions on sunsetting it.

2 Likes

I don’t recall ever using the ‘high’ quality type, but I do use the ‘low’ quality type. When I come across releases with no data attached, and I can’t find any corroborating evidence on Discogs or Google (eBay, Allmusic, RYM, etc.), I read its history and decide to either mark the quality as ‘low’ or propose its removal. I think the ‘low quality’ tag is useful to me in this case. Isn’t there a similar ‘needs improvement’ tag on Discogs?

Alternatively, we could have some automatically applied tag or could report on low quality releases by checking the count of present optional attributes (cover art, discid, external urls, release date, country, label, etc.).

If the quality type were removed, I would likely be more prone to proposing removal edits for releases which seem suspect and don’t have corroborating evidence on the internet. If the community started voting against these edits, I may find myself annoyed I don’t have a way to filter/categorize “junk” releases - those with suspicious history, no info besides a tracklist, and no other evidence of existence.

And if you’re concerned reading this, I take destructive removals seriously, and would not propose to remove anything without thorough investigation. I always prefer to keep data where value is added.

9 Likes

I use the Low quality tag like @agatzk does. To mark something that is clearly lacking data and\or quality. I’d have read the edit page and attempted to make sense of the release. Often finding bad track times, or obvious errors. Something that just looks out of place and can’t be linked up with any real release.

Something that is just a track list is fine. Maybe it is out there somewhere, but sometimes errors are clear due to only a couple of actual edits to the release.

I want to say “Don’t trust this Release”. Or “Garbage Quality”

I agree with the “High” comment as pointless and could be dropped. As someone who loves adding every scrap of data from a CD, and then all the scans, I find some use of “High” bizarre and missing way too much. Yet would never mark my own releases as “High” even though there is usually not much more data one can add. Too many different opinions as to what is a “Complete” release.

Having a “Low” or “Junk” tag is very useful. I too don’t like deleting a bad release, or merging it. Some cases a bad release is so bad it needs removing, but having a way of flagging this is better than a delete as often no one will actually see the delete is happening.

I have never seen it used as a bullying tag. If that happens then report the editor as I can’t see there being many editors around like this. If they are bullying with the Data Quality, then no doubt there are other issues to look into.

6 Likes

Agreed very much with this.

More generally, agreed as well with “Low” being a lot more useful than “High”, although High can be useful as something to aim for when cleaning and improving data.

7 Likes

I use data quality occasionally.

For releases where I’m the artist, I use high because I’m reasonably confident that I captured all the relevant data and I know for sure which parts are artist intent. I don’t really care about this though, and I’m not sure it’s really useful to set these releases to high.

For one release that was time-consuming to enter properly, I entered a pretty bad tracklist with low data quality at first, then fixed it up in groups of a dozen or so tracks at a time. Once I finished fixing it up, I made the data quality normal. Getting all 555 tracks up to normal data quality in one go would have been too difficult, I think.

I can’t remember if I’ve ever done this, but I’ve thought about using low data quality when I get a used CD that I’m pretty sure is missing some of the packaging. E.g., enter the discid and tracklist from the CD label, but set data quality to low because I’m missing the packaging with perhaps a better tracklist.

2 Likes

MusicBrainz Sound Team - MusicBrainz might actually be a better example of when to use high, because it had multiple MBz editors involved rather than just one. Still a pretty small and probably not useful niche though.

I’d treat that as “normal”. Many editors will just add a CD and tracklist. As long as there are catalogue numbers, barcodes and data that actually makes it a real release then this is useful data. Add a discID and it is decent data.

If you literally just have a CD in your hand and added the best data possible, and left a note to say so, then this is fine. Someone else can add some scans another time.

Look at the average Digital Release imported from Spotify et al. Not much more than a track list at a catalogue number. Often with the wrong data. What you have done with the above CD is higher quality than those.

1 Like

I don’t care much about “high quality” marks, but I definitely pay attention when a release is marked as “low quality”. It is useful as a warning that a release may not actually exist or have some serious issues. And I did lower data quality a few times for that reason.

So if you ask me: I’m neutral about removing “high” quality, but I’m against removing the “low” quality. As @IvanDobsky said, if this system is abused then we need to raise this to the moderators. Other MusicBrainz features can also be used in an abusive way, but it’s not a reason to remove them.

1 Like

I usually set releases to high quality after adding all* available metadata as release, recording and work relationships. E.g., from a physical release I have on hand or for a Korean digital release (those streaming services often have very complete credits, similar to a CD booklet).

Generally a good way to mark releases I worked on as “completed” from my point of view.

*There are some exceptions, since a few relationships are really difficult to research, confirm, or deduplicate, but everything directly related to the music is usually there.

2 Likes

I don’t think we should get rid of them, though I have always found the quality markers eye-bleedingly strong. They could be more subtle imo.

I have not used the ‘high quality’ marker much, but not because I don’t find it useful - just that when it comes to adding full works and relationships for a release my spirit is willing but my free time is lacking.

2 Likes

I use both high and low quality a fair bit, similar to others in this thread; high if all scans and info from said scans is present (or the artist provided good digital credits and those are all added) and low if a release is known to exist in some form but there’s very little detail about it (usually for unknown or incomplete tracklists and similar)

I would be against removing both high and low quality personally, as I think they’re both quite useful

1 Like

I rarely use it. Maybe calling it ‘completeness’ would be more useful. After all, that’s the only thing the system measures and bad data should be removed anyway.

The term “data quality” might confuse casual users too.

2 Likes

And it could then be some automatic indicator.

2 Likes

“Completeness” is hard to judge. Not all CD booklets have the individual musicians to credit for the recordings. And an online Digital Release rarely even has the writers for the works.

I add details like CD manufacturer, distributors, rights societies - information most people don’t care about.

We all have a different level of “complete” when we add a Release.

2 Likes

If it was automatic, it would be objective:

  • Has release events
  • Has Release-label relationships
  • Has recording-artist relationships
  • Has recording relationships
  • Has works with work-artist relationships
  • Has cover art

Each would add up points and there would be 3 level of completeness, with none being equivalent to complete 100%, of course:

  • Quite incomplete
  • Fairly complete
  • Highly complete

I often suggest to make such minor features handled automatically, because I’d like me and other editors not waste time and reserve their time for real database improvement edits.

2 Likes

Personally I like quality more than completeness for this. Marking something as low quality can be a good indication that the data might not be correct even if it is complete. E.g., changing a track artist from [unknown] to the right artist doesn’t increase completeness but it does increase quality.

There is no automatic way to judge the “completeness” of a MB Recording or MB Release’s data with this criteria. Counterpoints to each point:

  • A GB release event might be missing the IE release event.
  • A release could have a copyright statement but omits the rest of the copyright statements
  • A recording could be linked to guitarist credit because the guitarist announced they participated, but the MB Recording could be missing the other instrument performers.
  • Same as above.
  • A work could be incorrectly using the “writer” relationship instead of separate “composer” and “lyricist” relationships, and vice-versa.
  • The cover art could be fake.
3 Likes