Metadata is the biggest little problem plaguing the music industry

Well, there are a few points. I would like to start at mastering. MB tends to disregard mastering at the recording level and apply it to the release. This is very problematic. Mastering has many levels … you can master a recording, then you can master a release (ike iTunes, CD, etc) and there is also a remaster which could involve the recording level, the release level or even both.

Second I would like to touch on the platform portion. There are items like iTunes. You have iTunes original, iTunes Plus and MFiT (Mastered for iTunes). The list of contributors will likely be different if a recording is released on more than one of the standards. Regarding digital platforms, There is a whole can of worms here. If one wants to consider a Spotify stream different from a Google Music stream for example, then we must also consider that all digital download containers, methods of encoding including bitrate, codec, sample rate, etc all be different as well.

MB is missing a level of distinction between the recording and release. That is where the ISRC really fits in, at least in my opinion. Then you can have numerous “releases” of streaming when in reality they are all the same release, if that makes sense. Kind of a distinction between the release and the delivery of the release.

Please note that my idea has nothing to do with MusicBrainz. Metadata isn’t going to solve the original problem, it’s a red herring. The only way to solve this mess is through clear contracts.

Each of these mastering jobs is done (or should be done) on a contract. The only legal obligation is between the mastering engineer or company and the copyright holder of the mastered release or recording. The only payments should be between those two parties. Streaming platforms and stores are not in the picture here (unless they are the ones contracting out the mastering).

Streaming platforms shouldn’t just be ripping CD’s and putting them online for streaming, and I really doubt any of them do that. They are in contact with a copyright holder to add recordings or releases to their platform. They agree on a contract, and pay the copyright holder and only the copyright holder. The copyright holder should take care of its own contracts further down the chain.

And this is where it goes wrong in the current system. The music platform is supposed to pay not only the copyright holder they have a contract with, but also a number of contributors like songwriters. But it doesn’t really know these songwriters. There are no contracts between the streaming platform and the songwriters. There is supposed to be metadata, but we all know that is not reliable and never will be. So many of these contributors just don’t get paid and all the companies between the consumer and the songwriter, engineer, producer etc. become extra rich.

It does make sense, but there is a trade-off I think. Adding this extra level also adds complexity for users, and MusicBrainz’ database schema already is very complex. Would the benefits of this extra information outweigh the added difficulty for users? A database with millions of different users and many different use cases is always going to be a compromise.

1 Like

I don’t know if all copyright holders are trustworthy or reliable…
I think artists feel more at ease with current royalty collectors/distributors that are JASRAC, SACEM, GEMA, etc. I think it’s better with third parties, with no “direct interest” in this or that song.

1 Like

Yes, I did go in a different direction than you intended I can see now.

To the last portion on the trade-off, all I can say is the article posted did say the following:
“ATTEMPTS TO CREATE A GLOBAL CENTRALIZED DATABASE FOR SONG METADATA HAVE ALWAYS ENDED IN FAILURE”
My thought would be how to change that, but there is the problem. It takes effort and detailed work, but no one has stepped to the plate as of yet.

To be fair, that is absolutely intentional: our guidelines call for merging remasters. It’s not bad editing.

5 Likes

As witnessed often, most recently: New editors: a guided, staged, self-training approach - LONG - #15 by rvb

And nobody will, so why keep looking for a solution there?

1 Like

Hi all!

A lot of good points have been raised here and the intractability of this problem is quite clear. I’ve spent years thinking about how to make an impact or otherwise improve this mess. It really feels that the incumbent industry doesn’t want any change; they don’t care if artists get paid or not.

I care.

Could something like this allow us to build an parallel universe music industry with proper values and proper rewards for people who do real work?

I’m not convinced it could work, but I am interested in starting a conversation.

Thoughts?

12 Likes

I think we may be still on different angles here, so please understand I speak with the primary effort of MB. This is a long post, so if you have no interest, don’t read it. I make an effort to explain my thoughts vs just listing them with no explanation.

First off, I think the best MB can ever do is to most accurately represent a release. To say no one will ever get a good database so why try is not how things work, or you would not be posting to a forum on the internet as no one would have bothered to invent it. But as you have mentioned, this comes at a price, and one must consider the trade off as diminishing returns come into play. The music industry has their own issue dealing with royalties and how they are paid, but I will add that I believe other high profile systems (like MB) do not help. Using iTunes as an example, and I do this because of the digital sales that I Am aware of, iTunes is at the top being the general winner in the area of volume, consistency of metadata, and correctness of metadata. Please note that I use correctness cautiously.

The ISRC, in my opinion, is one of the most important pieces of metadata for digital releases. Let me explain. With vinyl, cassette, CD, etc, you have things like a barcode which is visible on the release. That is a MAJOR identifier of a release. Please note that I am aware that nothing is ever perfect and even same barcodes can and do appear on different releases. Some CDs do embed ISRC, but that metadata is not as easy to extract and is not always even there. Additionally, you have a physical media. There is printing and well, the look of the media as a whole. If that is all captured, you can with good confidence identify a release, thus also being able to follow the chain to who did and/or gets paid for what.

Now we live in different times. Rather than royalties coming from radio and media sales, streaming is a major portion of that revenue and physical media sales continue to diminish. This complicates things because if a song is listened to, how do we know how to properly identify it? As I see it, at a radio station, this is easy. The station is provided the music directly with the identifiers, so when they play a recording they should know exactly what they are playing. When it comes to streaming sites, I sort of agree on your angle and your points. As it relates to MB, that is where I disagree. You are correct, I believe, that with a source like Spotify that it is easy to trace the play back to the royalty. Spotify was given the recording, and the organization that provided them that recording and metadata should know what they provided. The same applies to Apple Music on streaming. Where this gets muddy for MB is we do not know what we do or do not know. What? I mean that Sporify and Apple disclose to us a set of metadata that can be extracted from using things like JSON from the sites. We know that, but what we do not know is if there is more data that is not disclosed to us users that can identify those recordings. Now realistically, we know that not all is disclosed to us.

Please stay with me, I am trying to explain in enough detail to create the conversation, as I agree it can be helpful… Now we get to digital download sources, like iTunes, Amazon, etc. Now we have the same issue here. The releases all contain metadata put in place by the vendor/distributor. That data is holding the same issue as the streaming, what is it and what is it not? You may have one barcode on the iTunes file and a different barcode on the Amazon file and in reality they could be the same product, meaning that it is all from the same source. We simply do not know. Although we cannot be sure there, we can use the metadata and attributes like container, encoder, encoder settings, etc to identify it. That is the same logic as a physical release… we use all we can to identify something. On a CD, the color of the case matters in MB but a MP3 is considered no different from a M4A, and that makes no sense to me at all given the scope of physical releases.

It is my opinion that in order to become a leader in music metadata, you need to be more accurate on representing a release. I speak not of physical media as that is fairly well done. But over the last 20 years or so, it no longer applies to current releases as it once did, and no one is adapting. What I do is represent a release to start with. So I have a vendor (what MB sometimes does and does not consider to be a release label), ISRC, store ID(s), etc. If any of those attributes change, I do not consider it a duplicate recording. So if I have release ABC from iTunes, Amazon, and other companies, I absolutely consider those different releases and different recordings. This is where that layer between the release and recording is missing in MB. As a user, those are certainly different. But as it relates to the source / master, they may be all the same. Often times, we really do not know. That also opens up again the point I had on mastering and where it does and does not apply.

How can I relate this vs just posting a ton of words, let’s say that I am a “personal Spotify” and I broadcast from my music collection. Now, I need to pay royalties for each recording I stream. I know who to add a play to because I can identify what I played. I might have song 123 and play it, but I also know what release it came from, what vendor/distributor it came from, etc. From there, it is up to that source to do the same, know where their product comes from. On my end, the best I can do is accurately represent what I have. Does this mean that I end up with a lot of duplicate recording, sure. As an end user of 10+ terabytes of music am I ok with those duplicates, absolutely. I am happy to go into why, but that is not so relevant to this here, so I will not go into that. So for me, to use MB to identify and index my music is a downgrade. If I were to tag my files in MB, I would lose data and accuracy. This makes MB useless to be as a tagging source. I am able to use MB though to get data on the release side, but other than that, I can only use it from things like lookups (what bands did this artist perform with, what releases include this recording, etc). But for further detail, MB drops the ball and my metadata as provided by the vendor is far more accurate to thoroughly identify what the recording actually is.

So to me concern which is MB, it cannot be targeted to me as a tagging source. It cannot be marketed to record labels to identify their royalties and all that mess. I cannot use it to identify a recording like a Shazzam type directly, it cannot properly identify remasters in a sense that one CD can have a piss poor master and another have a great master but MB marks them as a duplicate recording (but they most certainly are not from the perspective of the listener), etc. What is MB trying to accomplish in the modern day? It is amazing at a historical perspective like cassettes, vinyls, CDs, etc, that is for sure. So my angle on this is that MB does nothing to contribute to the modern era of music. This is also then indirectly contributing to the problem of royalties. Although the music industry has their own issues with it, MB does not even accurately account for what they do have. I hope that explains my thoughts vs just tossing my opinion out there on this. It is one of the main reasons I do not spend my time with edits anymore as I feel I just dry the top of the tire as it continues to roll through the water. A lot of work and nothing real gets accomplished. I do take the time to discuss here though as I believe there is good to be done, and if I can help shape that for the better, than it is time well spent.

4 Likes

@rob - I did not see your post until now. I care in the sense that I a few years back looked for a quality metadata source, joined MB, and although I am not a top contributor, I ave contributed in my opinion more than just a casual user and with reasonable detail.

I am happy to provide input, if desired, to the cause.

2 Likes

Thanks for this long and constructive post, it raises some interesting questions.
I’m not sure to understand what you are suggesting to improve the situation though.

Can you elaborate on this missing layer ? How would it relate to releases/tracks/recordings exactly ? Would you attach ISRCs to it instead of recordings ?

2 Likes

It seems to me that both of the questions you present are related. I will attempt to explain that layer:
We have a recording, there is a original master and a remaster. They are not the same, although general speaking, MB consider them as such. Now, at the release level, those recordings can be reused. This differs from MB as it puts the mastering back on the recording.

Trying to keep this short… in today’s messed up era, ISRC plays a critical role as it relates to the digital aspect of songs. In my opinion, meaning one suggestion, is never should two ISRC’s be combined into the same recording. It is like when I started here and I was told no recording can have more than one ASIN, which is actually false, but this is the same premise as it relates to the origins and royalties.

The missing layer, to be more direct, is the mastering of the recording. See, the original recording is what it is. Now, you do in fact have a second layer of mastering … MFiT, Amazon, FLAC distributors, CDs, etc. That layer is missing. MFiT is a sore point with me, and I wish to avoid that debate here.

In short, MFiT only means that in the master they gave to produce “another master” is fine tuned with the intended second master as focus. See, with a release, you will master an already mastered recording to the release, please not that I use the term master/mastering in a certain way. On one side, you master when the different mics are combined into a recording, this is a mastering process. Now, you then take those and master them to a release. In older days, that as all the same. In modern days, MFiT is proof it now differs. I can recall proof of CDs being sold as nothing but MP3s pressed onto CD, I did not say burned, but passed to CD. We have the same today with “fake FLACs”.

Digital releases NEED specificity. I personally have MP3 releases and a “duplicate” M4A release. Are they the same? NO. They serve different purposes and are like having a release on CD and cassette. The same can be said on FLAC files of 16 bit and 24 bit, but again, not going there. I mean just to raise the point, not to debate it.

I feel as I am rambling … A recording that is listed as a recording on more than one release should sound identical on all releases it is used on, have the same meta attributes, etc. If it is unknown, it should be a unique recording, assuming we still have no intermediate layer.

1 Like

I wanted to address this separately. The ISRC is applied to the recording master, not the medium master. The same royalties can apply to 10 CDs all with different barcodes as an example. What is critical is to know what recording, or better what rendition of a recording, is used where.

So you would attach exactly one ISRC to one “master of a recording” entity, right?
And “original recording” would have metadata like today (but not ISRC), and linked to one or more “master of a recording” ?

Different compression algorithms or even audio formats settings = different “master of a recording” ?

Current definition of a recording is:

A recording is an entity in MusicBrainz which can be linked to tracks on releases. Each track must always be associated with a single recording, but a recording can be linked to any number of tracks.

A recording represents distinct audio that has been used to produce at least one released track through copying or mastering. A recording itself is never produced solely through copying or mastering.

Generally, the audio represented by a recording corresponds to the audio at a stage in the production process before any final mastering but after any editing or mixing.

What would you change in this definition and how would you define “master of a recording” entity ?

Sorry for asking too many questions, but i’m trying to understand which changes would be required and how they would fit in current MB database structure.

3 Likes

Let’s get some official ISRC documents in here.

4.1.3 No re-use
[…]
A new ISRC should be assigned whenever a recording has been re-issued in a revised or fully remastered form. Also see Sections 4.9.1 Re-mixes/ Edits / Session Takes and 4.9.10 Re-mastering

4.1.4 Format Independence
A single ISRC is used for each unchanged recording regardless of the format in which it is released.

4.9.10 Re-mastering
When a track is re-mastered for the purpose of reproduction on a new carrier without restoration of
sound quality (also see Section 4.9.1 Re-mixes/ Edits / Takes), then no new ISRC is required.
It is nevertheless the Registrant’s responsibility to decide where to draw the line between sound
restoration (full re-mastering) and simple re-mastering.

FAQ:

7. Our company uses in-house code for identifying our sound and music video recordings. We then use this in the designation code of the ISRC. Sometimes an in-house code may apply to two versions of the same recording because we have remastered some of our backstock for re-issue. Can we use the same ISRC for the new remastered version?

No. Re-use of an ISRC that has already been allocated to another recording or to
another version of a recording is not permitted. This is in order to guarantee the unique
and unambiguous identification provided by an ISRC.

A new ISRC should be assigned whenever a recording has been re-issued in a revised or re-mastered form, even if both items have the same in-house code.

More stuff:
ISRC bulletin archive

GRid (Release identification, BTW has anyone seen these in the wild?)

What a bummer:

The primary function of the GRid is to support machine-to-machine communication
through system-to-system messaging. It is therefore intended to be largely invisible
in use. However, there may be circumstances in which GRid is displayed to a human
user, in which case some rules for presentation are in place.

8 Likes

That’s all nice and fine, but the specs say how it should be. Doesn’t change the fact that the exact same recording often gets assigned new ISRCs in reality. I have seen plenty of examples for this. And I bet there are enough cases where the same ISRC was used for different recordings. I don’t know any myself, but one of you can probably provide an example.

In the end MB needs to deal with the often messy reality.

4 Likes

I think we are doing fine with our definition. Just wanted to give all of this a bit more context. :slight_smile:

4 Likes

@Zas - give me a bit to put this together. This was brought up in the past, so I want to collect all the data on this before just posting info. The concept remained with the three tiers (work, recording, release) but changed where and what data is stored in which object, and the creation of a new object off of the recording with a one-to-many relationship to the recording.

1 Like

@outsidecontext and @chaban-
Thanks for posting that, it is helpful for all to see the specification. Although there are times that a new ISRC is assigned to the same recording, I would like to see some data supporting the frequency of this. I believe (meaning that it is my educated guess) that this is more an issue in the past. That said, I could be wrong on that, I can only speak from my own personal experience.

I believe the issue will be a wash with the barcode issue in the same. A barcode is sometimes used on two different releases, a new ISRC is sometimes assigned to the same recording. The barcode is a bit more destructive since having a duplicate recording does not really provide bad/conflicting data, whereas a barcode on two different releases can cause some brain malfunction while you try and figure it out.

Although I am 100% on the side of a reform here, I will also be first to admit that nothing will ever be perfect. Even if MB could design a system that itself is perfect, junk in = junk out. So any errors from the music industry themself, or any other source for that matter, are simply out on the control of MB. Although I hate saying it from a data standpoint, it is just something we need to accept and should focus on the side of matching the metadata, not trying to change it to what we think it should be. There is a different place for stuff like that, as it is also important data.

Speaking using an example of iTunes music files via download … there is a difference of data importance compared to MB. This is also the case, but a bit different, with other stores like Amazon, but I have a large base of samples to use for iTunes so I speak to that with facts to support it in hand. One great identifier is the “vendor” atom. Currently this is formatted as :isrc:. The ISRC is the ISRC, simple. The vendor is the label (and sometimes not a label) that is tied to the ISRC. There is also a “copyright” atom which supplies the c and/or p holders, depending. That is one big conflict I See trying to locate the proper label to see in the label MB field as it is not really disclosed to us in the same as physical media where you match logo/picture to name and you have it.

As I mentioned before, I believe our job should be to first to properly identify music as it is and can be identified, not what we think should be there. Second is to make some sense of the crap and make it usable.

1 Like

Release is an aggregate and stands beside the other two. On the atomic level MB goes like this: work → recording → track (which not coincidentally mirrors FRBR’s/LRM’s work → expression → manifestation, a model that NGS was to a certain extent based on). But ok, I’m arguing semantics here.

Now that’s something I can’t agree with at all! While all those years I’ve been using MB for my personal collection, still from some idealistic point of view I genuinely believe in MB’s strive for being the perfect/ultimate music database (and eventually a cultural one, indexing other art forms too). Which means it shouldn’t repeat industry’s cataloguing mistakes.

However I do agree with you that, ideally, it should fully map industry’s standards.

Multiple ISRC’s jumbled into one MB recording is simply lost data. So, if our own definition of recording doesn’t match the industry’s one (for whatever reasons - remasters, erroneous code assignment by different registrants etc.), then we need to go down to a more granular level (recording → track) to reflect all the exceptions.

Now I don’t think I’m in a position to discuss the technical side since that’s not my cup of tea. I can only present the idea for a specific case:

  • ISRCs remain attached to MB recordings
  • Two separate masters of a recording are represented as different MB tracks
  • each ISRC can be optionally mapped to different sets of tracks

I have actually encountered a situation like this not a long time ago and it’s already been discussed in another thread:

  • An exactly the same recording, Godmother by Holly Herndon and Jlin has been first released as a single in 2018 and assigned a code GBAFL1800302.
  • It was reissued as a part of the 2019 album spanning multiple digital, cd and vinyl issues with a new ISRC, GBAFL1800329.

I don’t think there’s any need to question or ponder on this decision, only to reflect it properly. That is, the [MB recording 1] contains two ISRC codes, where [ISRC 1] maps to [MB track 1] and [ISRC 2] maps to [MB track 2], [MB track 3] and [MB track 4].

7 Likes