Insufficient documentation on pseudo-releases

Arete01 · September 14, 2020, 7:21pm

After using MusicBrainz for 9 months some things in the are still not clear to me in the docs. I have some limited professional experience with technical writing and documentation and I think the docs could be a lot more clear on this, but this requires feedback. So here it is.

After reading about Pseudo-Releases there is some things I dont understand:

As far as I understand, translation pseudo releases (let’s call them TPR here to make it easier) are unofficial alternative track lists and titles. That means they can be styled however the user feels like, as long it is reasonable. In effect, this can result in many different TPR versions, which brings me to the questions.

Is there any formatting guide for TPRs? For example in this release, the original track name is イオンの森, but the release is written:

イオンの森 (Forest Of Ion) But shouldn’t it just be Forest Of Ion if it is a translation?

This brings me to the next point: If two or more TPR exist of a release, how do we determine which one is correct and which one is redundant? Or do we simply let them all exist?

Personally I prefer to have both the original language preceded by the translation in parenthesis. Reason being if I search for one or the other in my media player, I will get a match in any case. This is still a problem in MusicBrainz online.

So here I am talking about many things, but my main point here is that I am missing directions on how to format the translation titles and that the docs could be a lot more clear on this.

wlhlm · September 14, 2020, 8:54pm

As a general statement, I have to agree with you on this. MusicBrainz Documentation - MusicBrainz is pretty sparse. I find myself quite frequently searching in forum discussions due to the documentation being insufficient. As an example of thing I’m currently encountering, it looks like conventions around digital releases are still in debate. My general feeling around MB is, that in cases where the rules are not clear and there are multiple ways to do it, people should just choose the way they prefer and we can always come back later update the entity. Personally, I prefer to get things right from the beginning to avoid unnecessary editing work later on.

My experience with TPRs is limited, but I think just using the translated track name is better as it is cleaner. Using both the original title and the translated title as you prefer can be solved by the Picard/Tagger side of things by looking up the original track title via the referenced recording and the transliteration relationship. Of course, it might be quicker for you to just add a new pseudo-release with “Original title (translated title)” tracks than to wait until Picard gets updated.

draconx · September 14, 2020, 10:24pm

The only release in this RG release has status set to “Official” so pseudo-release guidelines don’t apply. If the current tracklist does not match what is printed on the cover then it should simply be changed.

I mostly ignore pseudo-releases but if I find two pseudo-releases that have substantially similar tracklists then I will normally merge them.

It seems OK to me if someone cares enough to enter a multiple-language, multiple-script pseudo-release with each track title being the same thing in two different languages. This would be sufficiently different from a pseudo-release with tracklist in only one language to warrant keeping both of them.

I think we all agree that Pseudo-Releases are a bit of a kludge and not an ideal solution to the problem of transliterated/translated tracklists.

aerozol · September 14, 2020, 11:06pm

I see no problem with letting them all exist. Which answers the first question as well - there isn’t a formatting guide.

I don’t really think there should be a concrete guide (not to say that the docs couldn’t be better ofc) - different people have different translation wants and needs. I can imagine a guide that lays out a single ‘correct’ translation might stop you from entering your ‘both languages’ TPR’s

thwaller · September 15, 2020, 4:29am

I think this is a dangerous approach to the issue. It is my opinion that I would rather have incomplete data than inconsistent data. The reason for this is that is there are corrected needed in the database that are inconsistent, more manual work is required. If the correction is consistent (like removing the feat from a title and placing in credits), a script can be created to assist in the changes.

The bad part is the longer this issue is open, the more difficult recovery and correction is going to be.

This I 100% agree with, but there needs to be some sort of at least minimal decisive guidance to accomplish this.

Kripsy · September 15, 2020, 6:23am

I think the formatting of transliterations is an interesting case to have some definitive guidance on. Here I’m talking about if transliterated tracks should be formatted as “[original language title] ([transliterated title])”; “[transliterated title] ([original language title])”; or just “[transliterated title]”. Part of why I think this is important is because some official transliterated releases take opinions on this formatting and we should know how to approach it in all cases the same as we do for “feat” and “remix” formatting.

Personally, I think there should be enforced to have only one transliteration per language per release and the formatting to be “[transliterated title]” with a transliteration relationship to the original work. This should provide all the necessary information to reconstruct the different formattings people would want in Picard or whatever app is pulling the data.

However, if we are going this far we might as well get rid of the psuedo-release entirely and just make transliteration fields part a work/release/recording/etc. with multiple fields for different languages similar to how we do with aliases.

Arete01 · September 15, 2020, 6:39am

First of all it seems like this is something of an issue a lot of people are interested in. So it would do good to have some executive action here

I will refrain from going into the details of every opinion, though I agree with the general direction here. It seems people have their own subjective opinion (including) me. But when it comes to deciding, if we are to change anything at all, I really hope that the resolution is based on two things:

The programmatic value of the decision. As @thwaller mentioned; the most important thing is consistent data. That means as little “free text” as possible. It will certainly make it easier for the programmers.
The convenience and compatibility of media players and other databases. Long titles can be a problem. But it can also be a problem to search for a track called 上を向いて歩こう if you search for “Ue wo muite aruko” or even worse; Sukiyaki…

draconx · September 15, 2020, 6:48am

Transliteration means a change of script, not a change of language. If we transliterate a release into more than one different script all those transliterated releases would necessarily have the same language.

Quite honestly other than obvious duplicates I do not believe we currently have a problem with “too many different pseudo-release tracklists”, so I don’t think we need new rules to deal with problems that probably don’t exist.

Yes I think we can all agree that a method to directly associate transliterated and translated tracklists with releases could be better than the current pseudo-release system. But currently this is not possible. So we must use pseudo-releases today.

Arete01 · September 15, 2020, 6:55am

Quite honestly other than obvious duplicates I do not believe we currently have a problem with “too many different pseudo-release tracklists”…

I agree with this. If this is the general consensus there is no need to over-complicate things.

Except of course, if one would like to improve the data quality.

So at least I am glad I am not the only one who feels like this is slightly confusing.

So what we can conclude, it seems, is that either way, if we change anything or not, the documentation could need some improvement.

If there are no guidelines for formatting translations, this should be written clear in the documentation, right?

draconx · September 15, 2020, 7:12am

The thing about translated and transliterated pseudo-releases is that they are inherently about personal preference and there is no obvious way to judge “quality”.

For example, there are many different ways that Japanese can be transliterated into Latin script. I suspect most native English speakers would prefer to use one of the Hepburn variants but a native Japanese speaker might prefer Kunrei-shiki romanization as this method is taught in Japanese schools.

To say one transliteration is “higher quality” than the other I think misses the point of these pseudo-release tracklists.

Edited to add: We do have language-specific style guidelines for tracklists that of course apply to pseudo-releases just like regular releases. So for example, a translation into English should normally follow the capitalization standards for English releases.

Arete01 · September 15, 2020, 11:20am

Sure, I agree.

I think you misunderstood what I meant by data quality. As you say, there are almost endless ways to translate a title. In data science we tend to talk about quality in terms of how structured it is, not how correct it is. So by quality I meant that rules have to be clearly defined first, so that one way to translate can be categorized as, for example type XYZ and another way to translate can be categorized as ABC. If the rules (or guidelines) are followed, at least one would have a data set that is categorized and flexible. I assume that this would add value to the developers, but this is obviously not for me to determine, and a little off topic.

aerozol · September 15, 2020, 10:21pm

Yes! Can you propose a specific wording change that would have made things clearer for you/allowed you to edit with confidence?

I see pseudo-releases to be a lot like informal/non-genre tags. Messy data, uncharacteristic for MB, but still offering benefits for users.