New metrics to accompany the new editing system

dukeja · July 31, 2016, 12:31am

In my work on developing VideoBrainz concepts I came up with an idea that I think will have broader appeal than just with VideoBrainz - so I’m sharing it here. Perhaps this, or something like this have been proposed before, but I haven’t been able to find it.

I propose that we add two metrics to all entities. These would be best done in combination with NES:

Completeness. This is a measure of how complete the data is on the entity. Not all attributes of an entity should necessarily contribute to the completeness metric. Think of it this way. If the absence of the data means that the entity is considered incomplete; then that data should count. This could be a very useful metric in focusing on what needs the most work. This metric would be fairly objective. Also, all attributes of an entity don’t need to be weighted the same. The idea here is - the lower the completeness score; the more folks should try to work on it.
Quality. This is a subjective score most of the time. How this is calculated is still fuzzy in my mind. But I would tie it to reviews. The basic idea I’m shooting for is that reviewers would provide a quality ranking. But the final quality metric wouldn’t simply be an average of those votes. I think that reviewers should also have a ranking based upon how respected they are in the community. And, the more times the data for an entity has been reviewed should be factored in. So, for example, an entity with an average ranking of 8 (out of 10); but with a single review; should have a lower quality metric than an entity with an average ranking of 8 (out of 10); but with 100’s of reviews.

InvisibleMan78 · July 31, 2016, 8:59am

+1 for Completeness (including a report to find entities within a specific score range)
-1 for Quality, because you can’t measure quality. Everyone’s opinion about quality is different, especially for music or video and reviews about it

dukeja · July 31, 2016, 11:02am

Just to be clear. These are measures of the data, not the item the data is about. Or, perhaps instead of quality simply a metric of the number of real reviews the data has received.

chirlu · July 31, 2016, 12:58pm

BTW, MusicBrainz has high/normal/low quality markings, they just aren’t used by anyone (more or less).

dukeja · July 31, 2016, 7:48pm

Some of the differences between what I’m proposing and the Release quality marking is that the Quality metric I’m proposing is an aggregation of data derived during the editing process, while the existing quality marking is a single attribute which is the opinion of the few individuals who were involved in setting that attribute and approving having it set. But I’m really not a fan of the idea though in its current form.

Perhaps there is another way to get at the problem. What if we had a process similar to, but separate from the editing process where data can be verified. It could be done as part of the review process of the editing process; or it could be done separately. Essentially, we would allow people to annotate an entity with validation records: annotations that indicate that they have examined certain attributes of the entity and have verified that the data is correct. Entities that have numerous validation records would provide an indication that the entity is of high quality. At least in the opinion of those who have taken the time to look into it.

yvanzo · July 31, 2016, 8:50pm

This is certainly a better name. While AcoustID relies on taggers for recording fingerprint, I don’t think that MusicBrainz can rely on complete automation for release quality. To me, your proposed metrics are complementary rather than opposed to the existing data quality field.