TAG format varies from time to time / how to set standard for tagging


#1

Hi everybody,

last time I updated metadata of my audio files some month ago and Picard 1.3.2 wrote tags in the format “MUSICBRAINZ ALBUM ARTIST ID”. Then, last week after I bought some HighRes albums I used it again and then Picard wrote tags in the format “MUSICBRAINZ_ALBUMARTISTID”.
I thought there was an update somewhere because someone noticed that spaces in tags are not that good.
Right now, after I bought some songs and used Picard again it wrote tags in the format “MUSICBRAINZ ALBUM ARTIST ID” again.

First I use Picard to get ISRC, ACOUSTID_ID and the minimum required MUSICBRAINZ_* tags, then I use foobar2000 to get ReplayGain and finally I use Mp3tag to correct spelling, remove unwanted tags and so on. In Mp3tag I have a filter I apply to files that does that. It not only identifies eg. TRACK filled with 2/18 and rewrites it TRACK 2 and TRACKTOTAL 18 but also removes unwanted tags like MUSICBRAINZ ALBUM RELEASE COUNTRY.

I now have two questions:

  1. why the hell are there two formats in Picard for tagging and how do I set to it to the “no space format”? (MUSICBRAINZ_ALBUMARTISTID instead of MUSICBRAINZ ALBUM ARTIST ID)

  2. How can I set a standard on what tags I want to keep original, what tags I don’t want at all and what tags to overwrite?

Thanks!


Accented vowels are deaccented in tags
#2

I forgot to mention that sometimes Picard writes 2/18 into TRACK and sometimes it just writes 2 so I manually have to check how many tracks the album has.
From where comes this inconsistency?


#3

Given that the way Picard stores the tags has not really changed since its beginnings I suspect something else is doing things wrong here. One possibility is that you are dealing with different tag formats here (ID3, Vorbis, Flac, …). AFAIK Foobar2000 does not know about MB specific tags and displays them rather arbitrarily.

Different formats could also explain the different track number formats. In some formats you have separate fields for track number and total number of tracks. In other formats those two are stored in a single field separated by a slash. It’s the responsibility of the player softwsre to parse this correctly.

EDIT: I am currently answering from my phone and cannot look into this more deeply. But if you can look into the tag format issue and maybe provide some details about two files that differ I can give you a more in depth answer later.


#4

Thanks for pointing me towards the different tag formats. I use Mp3tag to correct and verify everything is fine, foobar2000 is just for adding ReplayGain information.

I copied three complete albums to a new directory. One is FLAC, one is MP4, one is MP3. I completely removed the tags themselves, not just the information in them. Then I scanned the three albums (all are known to MusicBrainz) and saved all the data Picard found.

The following tags are written respectively:

FLAC:
DISCNUMBER = 1
DISCTOTAL = 1
MUSICBRAINZ_ALBUMARTISTID = 3d2b98e5-556f-4451-a3ff-c50ea18d57cb
TOTALDISCS = 1
TOTALTRACKS = 10
TRACKNUMBER = 3
TRACKTOTAL = 10

MP4:
DISCNUMBER = 1 MUSICBRAINZ ALBUM ARTIST ID = 8ef1df30-ae4f-4dbd-9351-1a32b208a01e TOTALDISCS = 1 TOTALTRACKS = 21 TRACK = 3

MP3:
DISCNUMBER = 1/1 MUSICBRAINZ ALBUM ARTIST ID = f6acd23a-56fd-439e-a69a-99af379d4411 TRACK = 3/13

I know that proposing here for a change of any tags except MUSICBRAINZ is the wrong place but regarding the MUSICBRAINZ own tags, can we please use just one format? Preferably the one used for FLAC because spaces are evil! :slight_smile:


#5

I agree, MB tags should be consistent across formats.


#7

Let me reply for all tags, “standard” tags and MB specific tags, here. As for this discussions with standard tags I mean tags that are either formally standardized or are pseudo-standard as they are commonly used by players and tagging software.

Standard tags: Looking at the output above, there seems to be a TRACK tag. For some reason MusicBrainz Picard just can’t make up its mind and writes it all different all the times, the developers must be crazy, right? Just that there is no single TRACK tag at all. There is TRCK for ID3, trkn for MP4, WM/TrackNumber for WMA, Track for APE and TRACKNUMBER for Vorbis. To make it more complicated, for some formats this tag is supposed to also contain the number of totaltracks, e.g. in ID3 or Ape tags it is supposed to be written as “3/13”. But also the MP4 trkn tag contains the total number of tracks, just in a more specific binary format.

So to sum up: Different tagging standards are different, they use different tags and have a very different tag structure. There is no point in using “the same” tags for all the formats, what you have to do is using the standard tags for each format. And it is the responsibility of the software reading those tags to present them to the user in a meaningful and harmonized way. The output above is a nice example for this: The software used for this obviously makes some effort to present the tags for one format in a readable way. But it makes no effort to harmonize the displayed data between different tagging formats. Blame that software, but Picard is doing everything correctly.

MusicBrainz specific tags: Those are bit more complicated. Apart from a set of standardized tags all the commonly used tagging formats also provide some kind of freeform tags. Basically this allows you to store any data you want, and you can assign that data any name (with some format specific limitations) you want. Now again, there is not really a “MUSICBRAINZ ALBUM ARTIST ID” tag. There is “TXXX:MusicBrainz Album Artist Id” for MP3, “----:com.apple.iTunes:MusicBrainz Album Artist Id” for MP4, “MusicBrainz/Album Artist Id” for WMA and “MUSICBRAINZ_ALBUMARTISTID” for Vorbis and APE. Please note how the actual tags used for ID3 and MP4 are not all uppercase, as the output posted by @an3k suggests. Uppercasing them is just another arbitrary thing the software outputting the tags does (even though ID3 tags are case sensitive).

The MusicBrainz tags where chosen in a way that suits the tag formats at hand and follows common practices used for tags inside the format in question. For example Vorbis tags can contain most ASCII characters and are case-insensitive, but are commonly written in uppercase. They could contain spaces, but I have actually never seen a tag using spaces there. Now you could just use “MUSICBRAINZ_ALBUMARTISTID” and use that for all the tags (even if it would maybe look funny for a specific format), storing e.g. “TXXX:MUSICBRAINZ_ALBUMARTISTID” in ID3 (consider the TXXX just as a namespace for freeform tags here). And there are for sure some arguments for it, but as a matter of fact when the mapping of those MB tags to the tagging formats was specified it was given more care to make the tags fit in with the format than to unify them across formats.

Can this be changed? I don’t think so. The MB tags are around for a very long time, they are well specified in the Picard tag mapping and a lot of software follows these MB standards. We have gone into some effort to keep the tags stable over all the time. Changing them now would not do much good and just lead to inconsistencies and software not working as expected.

Furthermore it is usually not a really big deal. Any software reading or writing different tag formats needs to have some way to map between format specific tags and an internal representation of those tags. Doing this for MB tags if needed is not really difficult.


How to see Picard's "Total Disks" tag in MP3Tag?
#8

I’m afraid they’re pretty much set in stone – as other software will rely on the exact spelling.

BTW, here is an overview of the tags used for the different formats.

I don’t know the rationale, how these were choosen, but I guess whoever did it preferred spaces, and only used underscores where necessary.


#9

That is exactly the same as with the different charger connectors of the various phone manufacturers. Now we have microUSB and Lightning and that is much better for everyone. Imagine having to keep every single charger for your smartwatch, smartphone, gaming controller, etc.

And I’m afraid but this is exactly the thinking why we still have a huge mess in nearly every standard. While others need to get revised by the responsible committee the metatagging of audio files can be done by those who use it the most: the developers of tagging software in coop with the users.

Why do I think so is simply explained: You stick to the “standard” others defined even when they defined complete nonsense. When ordered to jump you don’t ask why but how high. But back to the standard: it just causes more trouble for users and also for the developers of music players and tagging software.
Nobody stops you from writing 3 into TRCK and 23 into TRACKSTOTAL into an MP3 header. For sure most players won’t be able to read TRACKSTOTAL (yet) but at least it’s already stored in the file. And if the “to be founded” committee publishes the standard online, free for everybody and easily accessible music player developers will update their software quite fast to keep their users pleased.

I don’t say that TRCK has to get renamed to TRACK or even TRACKNUMBER but to only contain the number of the track, thus a simple “3”.

Because the whole tagging thing is not officially standardized by eg. ITU, ISO or whoever else we all can more or less make our own standard.

If MB is so much community-focused why don’t they/you/we get into contact with developers of the other well-known tagging software and developers of widely used music players as well and propose a new standard that makes it easier for developers to integrate writing/reading tags and for users to tag the correct information?
It means a bit work, yes but compared to the work we all have right now it would save a huge amount of time and resources.

Lets assume you bought an album from one of the large online retailers. These files are not correctly tagged, for example the data that should be written into DISCSUBTITLE or SUBTITLE is nearly always stored in ALBUM or TITLE and even here it’s not correctly formatted. Instead of using square brackets they use parenthesis which is normally intended for the title itself, eg. “Alicia Keys - Powerless (Say What You Want) [Live]” and not “Alicia Keys - Powerless (Say What You Want) (Live)”.

Yes, very easily. The next version of Picard includes a special “tag standard conversion” function. If you use it Picard read the the data stored in the old tags, writes these data into the new tags and finally deletes the old tags.
It is an extremely simple function in terms of programming. I could do that with Mp3tag in 5 minutes.

Ok, then tell my how I can add this mapping to the music player I use on my Android phone. It’s closed-source and made by Sony. Oh, what a surprise. I can not change this by myself. But whatsoever the MB tags aren’t the real problem because not a single application I ever used except Picard supports these tags at all. The main point is to simplify and harmonize the various formats.

Simply said: Because one boy in kindergarten wears his pants the other way around doesn’t automatically mean he’s smart and others has to do so too.


#10

This somewhat misses the point off why tags are different. A common misconception is, that media file tags are just a list of key / value entries. This is true for some, e.g. Vorbis comments are done that way. And they just define a pretty short list of “standard” tag names. On the other side of the scale you have something like ID3, which defines a very extensive list of tags, some of them specifically specified to store the intended data in an optimized way. Probably one big driver for the development of different metadata formats is that people were unhappy with other formats from a technical point of view. And since your new format is not worth much if everybody uses it differently, such a format specification usually comes with a set of standards.

In this area “standards” are largely made by usage. You noticed all the MP4 tags containing “----:com.apple.iTunes”? Well, Apple needed some additional tags and put them in this namespace. Others adopted this an reused that namespace for their own usage. Or TPE2 is used for “Albumartist”, but that’s not how it is defined in the ID3 spec. But players like iTunes and Windows Media Player used it that way and everybody follows.

Of course Picard could just write the tags in any way it wanted. But what’s the point of a tagger, if the tagged files are not understood by any player in existence? In the past Picard was even more conservative about the tags it supported. It did not write all possible tags it understood to all formats, only if there was a standardized or commonly used tag for the purpose available in that format. In recent years this has been relaxed and Picard now writes most data to the tags, so people having flexible software hat can handle any ID3 tag can make use of non-standard tags as well.

Ok, you have noticed that retailers and labels are often quite lazy with how they treat the tags. That’s definitely true, but any New Tagging Standard won’t change that.

It is not really worth to reply to this comment, but hey: For sure having a function that converts the tags in your files is not any big deal. If you want to impress me use that 5 minutes to write some magic that updates all the software using MB tags and also updates existing files (since otherwise all the software needs to support both old and new tags, which for sure does not improve the situation). And yes, there is software using MB tags, and online services, too.

How exactly did you plan to convince them to use the New Tagging Standard? Also this player already does some kind of mapping. Does it display the track titles and numbers for different formats? Yes, then it needs to be able to read and understand the various tags.

This is the harmonization done by Picard, and I have seen this table used for reference in many discussions for other software, too.

No, it is the other way round: All the kids wear their pants the same way and you suggest we turn ours around.


#11

From what you write I can see that you are not a developer. That is not a problem at all but then you shouldn’t use arguments only developers can make use of.

In fact even in ID3 it is a simple variable/value list. It may use an “optimized” way to store the variables/values but that optimization is not that important nowadays especially it doesn’t reduce the file by megabytes at all. It is however useful for streaming audio but then again you don’t need a huge amount of metadata here since you don’t keep that data, you just listen to it so at minimum ARTIST and TITLE is required. In the streamers library however he may want to have much more metadata but he doesn’t have to stream these to the users.

That is exactly what I’m saying but you rejecting.

Instead of hardcoding “TXXX:MusicBrainz Album Artist Id” for MP3, “----:com.apple.iTunes:MusicBrainz Album Artist Id” for MP4, “MusicBrainz/Album Artist Id” for WMA and “MUSICBRAINZ_ALBUMARTISTID” for Vorbis and APE the developer could have saved a huge amount of time by simply using “MUSICBRAINZ_ALBUMARTISTID” for all formats and simply add the corresponding prefix for the given format.

You don’t have to check the file format and then write to the correct tag, instead you write directly to $format + MUSICBRAINZ_ALBUMARTISTID.

If it’s MP4 $format contains of “----:com.apple.iTunes:”, if it’s MP3 $format contains “TXXX:” and so on. This way is much easier in terms of developing than how it’s currently done. By the way: the same applies not only for tagging software but also for music players! And if you want to support a new format you simply have to change the $format specific prefix (and for sure where to write the tags at) but you don’t have to implement the whole format specific tag names again.

You still didn’t got the point. My suggestion was and still is not about using a complete different format for Picard only and then let the others apply to it or not. It’s about getting in contact with them to 1) agree to work together, 2) publish a survey on each of their websites so that you get data from as much as possible users back, 3) think about how to get the best out of what the users voted and 4) work out a plan to which everybody currently part of the “committee” can stick to about how and when these changes are going to get implemented.

That is wrong for A/C Chargers, Character Encoding and Instant Messaging. Instead of having plenty of chargers I now have two that have the same connector. Instead of plenty of different encodings we have UTF-8. And since Instant Messaging Protocols are proprietary and very often closed-source you have to build your own if you want to offer an Instant Messaging App. Yes, there is Jabber but guess why “nobody” uses it. It has too many flaws and is missing functions and the “main developer” doesn’t want to update it.

Well, if you publish a simple “How to use” paper that gives examples they will stick to it, because the better they represent their data the more likely users will buy it. And I’ve never seen any of such papers issued from ID3, Apple and so on, thus people are looking for their own way.

If you give me links to the sourcecode of each of the software and online services I’ll do it within one week. Because it is more or less just changing eg. “TPE2” to “TXXX:ALBUMARTIST” or “TXXX:MusicBrainz Album Artist Id” to “TXXX:MUSICBRAINZ_ALBUMARTISTID”.

Think Big. If the majority of tagging and playback software uses the new format and there is just you not using it, guess what happens: You can heck off and kill your product because nearly nobody wants to use it anymore.
Yeah, for sure does this player do some kind of mapping, I never said it does no mapping at all. But instead of using MB data (if present) it rescans the individual songs and tries to match them against GraceNote - which in my eyes turned from awesome to crappy within ~10 years. I guess it happened when Sony bought GraceNote. Dunno.

This is no harmonization but simply mapping. Harmonization means you name as much tags as possible the same while still keeping some kind of backwards compatibility.


#12

While I see your point of how it would be great if everybody uses the same tagging format (yeah, I get the AC charger example, really, but different situation) and there is only one format to rule them all, I still think you are extremely naive about how this can be done. You overestimate the burden and effort it takes to support multiple tagging formats in software, and at the same time you underestimate the trouble and effort it takes to break old standards and especially how difficult it would be to get everybody agree on a new tagging standard (just see how people still need to use ID3 v2.3, even though 2.4 has many advantages).

So before you go off and start your New Tagging Standard Initiative maybe think about the real world issues you want to solve. From what was discussed so far I see the following points made:

  1. Supporting multiple formats in any software is a pain in the ass and leads to a lot of duplication
    Ok, agreed in general. And no doubt the situation has caused friction. But adding a new format does not ease this pain, it increases it. You would need to get Apple and other big players to use your new standard, and even then I doubt the other formats would just go away. And players supporting not all the tags you maybe want them to support is also not only format related. And again I think you overestimate the technical impact that has on a piece of software. In your arguments above you stress how easy it is to implement things. Trust me: Having some mapping between tags isn’t too hard, either.

  2. Label use crappy tags
    True, but a new tag format does not solve it.

  3. You player uses Gracenote, not MB
    Yeah, would be great if Sony would support MB. But again, a new tagging format does not solve this. I have that feeling that this is not because some Sony engineer failed to use the proper tags…

Yep, that’s actually my argument. No use for a tagger that writes tags no player understands.

“give me links to the sourcecode” is the hard problem here.


#13

“Community-focused” doesn’t really mean you can come in and say “Why haven’t you done…”

outsidecontext is very comprehensibly explaining why he (and other very very busy people around MB) hasn’t tackled this problem personally, but that’s no reason to start pointing a finger at him as if it’s somehow his responsibility.

If you need a specific hand with a part of the code/plugin you’re going to write for Picard, or want input into the new standard that you’re spearheading, I’m sure you’ll find plenty of interested parties around here! It sounds exciting and you make a strong case (and everybody hates pointless competing standards, no doubt about it).
But accusing an open-source and non-profit music database and tagging software of also not solving the worlds tagging standard problems is a bit off the mark… so let’s keep things in perspective :confused:

[quote=“outsidecontext, post:10, topic:86600”]
How exactly did you plan to convince [Apple & Sony] to use the New Tagging Standard?[/quote]
My personal opinion… ^ this is what it’s going to come down to.
Politicking over common sense and coding skill :frowning:


#14

Here’s a start:






https://wiki.videolan.org/Git


https://github.com/tomahawk-player

There’s also a bunch of proprietary software used by some of MetaBrainz’ supporters: https://metabrainz.org/supporters - you’ll have to ask them for their source code yourself.

And then there are the people who use Gracenote or other non-MetaBrainz data sources - I don’t really have any contact with those, so you’ll have to look up those on your own.

Note that if you wish to change the MusicBrainz specific tags, you’ll have to make sure that future software will still be able to read old style tags, as there are so many files out there (in personal archives, on the internet, etc.) that have been tagged with the current format, and they’re not going to switch anytime soon (if at all), so you’ll have to support both current and new tag format indefinitely. Also, there’s a lot of old software out there which will not get updated - we still have people using both our FreeDB gateway and I think we also still have some software using our web service version 1 (@zas?). So software should probably give some sunset time to write both current and new style tags to files for at least a couple of years (not everybody will be able to “just” switch to something else/newer). A period equivalent to at least two full Ubuntu LTS cycles (= 10 years) I’d say.

Good luck!


#15

I had recently added support for some new classical music fields to my tagger software, see https://docs.google.com/spreadsheets/d/1afugW3R1FRDN-mwt5SQLY4R7aLAu3RqzjN3pR1497Ok/edit?usp=sharing and originally I tried to fit in with the naming conventions used for MusicBrainz fields such as ID3 TXXX frames using Title case words separated with spaces like TXXX:MusicBrainz Album Id whereas VorbisComments used uppercase seperated by hyphens such as MUSICBRAINZ_ALBUMID

But then I was using the very popular dBpoweramp to convert from one format to another and I found it was only doing a partial job of converting the metadata between formats. So it understood most of the standard ID3 Text frames (but ignored UFID) and converted them to the defacto VorbisComment field name. With TXXX:fieldname it converted this to a field with name fieldname but it doesn’t known about converting the name so TXXX:MusicBrainz Album Id becomes a VorbisComment field MusicBrainz Album Id which unfortunately is invisible to any software using the correct VorbisComment field. Now you can create a mapping within dbPoweramp but its not there by default and you have to do know what you are doing and know all the fields your file may contain.

So on balance I think its better when adding new fields to use the same fieldname for all formats where possible, unless a field is already defined for a particular format, and this is what I have now done. This then allows audio converters to do a better job without knowing about the particulars of a field.

But of course there is no way we change defacto standards that have been round for some time such as the various MusicBrainz id fields.

Do you agree with this approach ?

In the spirit of cooperation these fields I have newly added could still be changed if you believe they are badly named.


#16

I’d prefer if we’d specify how to normalize tag keys and keep the tag naming format consistent.

e.g.

def normalize_id(key):
    return key.replace("/", "").replace(" ", "").replace("_", "").upper()

is_artist_id = (normalize_id(v) == "MUSICBRAINZARTISTID")

If everyone does that, then it doesn’t matter if some third party software does not translate it properly across tagging formats. And every MB supporting software can write the value back using the “official” format afterwards.


#17

I think it does matter if 3rd party software doesnt convert properly because then user has to use other software to fix it, and they would probably have to fix it by retagging all over again as they wouldn’t know how the 3rdparty s/w had done it incorrectly. But I wasn’t for one minute suggesting we change existing MusicBrainz specific tags i was talking more about how to add new non-musicbrainz specfic tags such as ‘Movement’

BTW your translation doesn’t work because Musicbrainz Artist Id is stored as MUSICBRAINZ_ARTISTID in VorbisComments, note an underscore instead of a space in one case but not the other. In reality the existing tags only followed a very lose naming convention and it hasn’t’ been done totally consistently.