Selection of preferred disc id among discs of same content

outsidecontext · June 7, 2021, 8:09am

Maybe we need to get a bit more concrete, do you have something specific in mind? The example you started the discussion with is I think a pretty clear case, https://musicbrainz.org/release/c9c23b1d-6a3c-45bd-9798-4c2311ddab72 and Release “Journey to the Amazon” by Sharon Isbin - MusicBrainz are really distinct.

In general I think MB’s differentiation of different releases gives a nice balance. The majority of different releases I think is fairly distinguishable, especially for new releases. Things like different formats (Vinyl, CD, digital download, …) and differences in track listing (e.g. with and without bonus tracks) are rather obvious I think.

A bit more tricky it can get with overall same release, but packaged up for different markets (such as European vs. Japanese vs. US). But often there are differences in the packaging, especially the back cover, with different barcode, catalogue number and company logos (due to different labels and/or distributor). Here it depends: For some it’s important when tagging their files, for some not. Both works, if you don’t care you can pick any of those otherwise similar releases.

Re-releases are also often not that easy, as most releases give no clear indication on the year they were released on. But usually re-releases come with different catalogue numbers and barcodes.

And then there are of course some of the really tricky ones. Things like releases with identical packages, but for some reason significantly different running times; otherwise identical track lists and barcodes but a different logo on the back cover or even more subtle packaging differences. But in my experience those aren’t the rule.

What doesn’t warrant a separate release on MB are slight production related differences, e.g. different CD matrix code or minor variations in CD TOC. Some people would like to be able to record those, but the general consensus seems to be that it does not make sense to duplicate all the other identical data just for this and that information like different matrix codes needs a separate mechanism.

brainchild0 · June 7, 2021, 11:13am

Let’s use a very simple example. Suppose I am an American consumer, and purchase a CD of an album by an American artist. Because the artist is American, his named is represented in Latin script, and all the lyrics, song titles, and album titles are written in American English.

Now suppose I query MusicBrainz for the disc TOC, and notice two hits for releases.

As expected, both show all the information in English, with American spelling.

In fact, the two releases look identical with respect to the important textual data.

Now I have a dilemma, whether to choose one at random, or to investigate the difference.

Suppose the difference is that one is a Swedish release and the other is an Australian. If I choose one at random, then I incorrectly tag the rip of my American-released CD as from one of the other countries, without being aware of having done so. If I investigate the difference, then I will discover that neither data set matches my media, because mine is neither Swedish or Australian, but rather American. Then, I have another dilemma, whether to choose one from the two available, perhaps the one from Australia, because it, like the US, is English speaking, or whether rather to create a new release in the system. If I create a new release, then I have even more choices to make, including whether to commit the effort to finding and entering catalog and barcode information, or to leave the fields blank. If I leave them blank, then I save time, and might hope someone else would enter them later, but the plain expectation is that as long as I have already entered the release, someone later entering the missing information is less likely, even if someone would have entered that information in the case that he had found the release missing, and decided to take responsibility for creating a complete representation of the release in the system.

Now, consider an alternative.

An Australian happens to be the first to enter the album data into the system, and everyone around the world instantly benefits.

jesus2099 · June 7, 2021, 11:21am

You mean, using Picard?
You should see the release country, the catalogue number, the barcode.
So you can choose your specific release.
And, if you don’t see it, there is the Add to MB button, or something like that.

aerozol · June 8, 2021, 5:24am

I don’t understand how you can be concerned with being matched to the wrong release (the 50CD release) and want there not to be separate releases?

brainchild0 · June 9, 2021, 12:49am

@aerozol: I’m just exploring the issues, not adopting a position.

brainchild0 · June 9, 2021, 1:26am

Yes, I certainly accept the observation that the interface of Picard better supports the data model of MusicBrainz compared to other applications. This observation surely features into the broader discussion.

Yet, I continue to have doubts about whether distinctions according to items such as catalog number and release country are specifically what concerns many users as they manage their libraries.

As I gain a deeper understanding of entity types, I wonder whether the distinction that interests most users is the one represented by a release group.

In the documentation appears the following comment:

Both release groups and releases are “albums” in a general sense, but with an important difference: a release is something you can buy as media such as a CD or a vinyl record, while a release group embraces the overall concept of an album – it doesn’t matter how many CDs or editions/versions it had.

The explanation continues with the following essential representation of what most consumers consider, I think, in their relationship to musical publications:

When an artist says “We’ve released our new album”, they’re talking about a release group. When their publisher says “This new album gets released next week in Japan and next month in Europe”, they’re referring to the different releases that belong in the above mentioned release group.

Perhaps one way to deal with this difference is by adding support in Picard for switching among releases within a group, for files already tagged, without requiring the user again to resolve which release group is associated with the medium, based on the CD TOC.

If we consider the context of the earlier example, someone might choose the foreign release at some time when it is the best available match, but not the correct one, but later reorganize the tags by taking advantage of a feature that facilitates reviewing other releases, and possibly switching to one of them, which may have only become available in the interim, among the group that was earlier chosen.

Another case is a user entering a catalog number, which is correct for the media owned, a re-released publication, but having a preference for the cover art from the original.

It seems, however, releases within a group share much of the same data, in most cases, such as title, performer, track lengths, and so on, such that a release tends to capture a much smaller set of data fields than represented in the model.

outsidecontext · June 9, 2021, 5:15am

That already exists: In the context menu of a loaded release you can switch between other releases of the same release group (“similar releases” menu)

That probably happens a lot, and many people indeed don’t care. If you have let’s say distinct British and French releases, but the only difference is some numbers printed on the back cover and track list, disc IDs and recordings are exactly the same it doesn’t matter much for most. And that’s fine, no need to select the exact edition if you don’t need these details.

Still as a database of released music it makes sense to record this data.

The answer is that often they do, but often also not. Title usually is the same, but there can be exceptions. Same for artist credits, I think we had discussion about an ABBA Album recently, where artist credits on the front cover differed substantially between releases.

Durations and track lists often vary. E.g. Vinyl releases sometimes have shorter durations, tracks are distributed over different amount of mediums depending on format, limited editions come with bonus tracks (sometimes different ones based on country) etc. etc.

jesus2099 · June 9, 2021, 5:26am

Maybe it’s the case for people who rip and tag, but.

I manage my collection with MusicBrainz. I don’t rip CD and tag files.
What matters to me is what edition do I have: what is my catalogue number and specific tracklist of recordings in my edition.

The recordings I have are marked. So when I look at another edition or at a compilation, the system shows me what recordings I already have or don’t have yet.
It lets me know if it’s worth seeking to acquire this new release (many interesting recordings missing in my collection) or not.

Example of another edition of an album I own (another medium type and catalogue number), in which I already have all recordings, except the last one (a bonus truck):

So, the distinction that interests [me] is [not] the one represented by a release group.

brainchild0 · June 15, 2021, 1:39am

Thank you for indicating the “similar releases” feature, which I had not noticed. I think this feature does help users manage many of the concerns expressed in my earlier comments. Nevertheless, taking the subject a step further, and considering the underlying issues that inspired my comments, I feel that what may be most helpful is viewing associating release metadata with files extracted from a disc as a process of three steps.

The steps are the following:

Identify a Disc ID based on a sequence of track lengths. This step is generally automatic.
Identify a release group matching a Disc ID. This step similarly generally requires minimal user intervention.
Identify a release within a release group.

Under this model, step (3) might be repeated multiple times, to account for changes in preference, or of greater availability of metadata in the system at some later time. In contrast, steps (1) and (2) are unlikely to be repeated for the same audio file.

Under the current model, step (2) is implicitly captured in step (3). While this choice may seem to make matters more straightforward, I am inclined to consider whether a more explicit separation of choices would ultimately be superior, especially given differences in concerns for various users, as well as the evolving degree of completeness of the data repository.

A design in which tagging with a release group occurs on a file before tagging with a release may be beneficial in several important ways. An obvious benefit is the possibility of doing the former without doing the latter, in case it is preferred. A further benefit, though not the only remaining, is making a explicit marking of some release group to which a file has been committed, without dependence on how the association of releases to groups may later evolve in the data repository.

brainchild0 · June 15, 2021, 1:42am

Yes, I respect the difference. My idea is for finding a way of fully accommodating both cases with minimal obstacles encountered by any specific user due to preferences of another.