A script to write a song's full language instead of it's abbreviation?

hiccup · February 7, 2019, 4:32pm

Picard will write the language tag of a recording as a three letter code.
So e.g. a song in English will get the code ‘eng’
But I would like to have my tags showing the full word ‘English’.
And ‘Portugese’ instead of ‘por’, ‘French’ instead of ‘fre’ etc. etc.

Creating a script that will do that for just ‘eng’ is rather easy:
$set(language,$replace($get(language),eng,English)

But I quickly get stuck when I want to more languages.
I attempted some nesting with ‘$replace’, but that gave me odd results, probably because every ‘replace’ will also look at other nested ‘replaces’.

What would be the best and most elegant approach to set up a script for this?
The $if function is probably also necessary? But I am making a complete mess of it all when trying to concoct something that works.

I am not aiming to include all hundreds of existing language codes, but it would be nice if it would be not to difficult to add a new language code to the script when I run into a new language that wasn’t present in my libary before.

T.I.A.!

rdswift · February 7, 2019, 11:34pm

Rather than a script, how about a plugin that adds a new scripting function?

I’ve seen requests for this sort of thing before, so I thought I’d whip up a plugin to provide the functionality. It’s a first release at version 0.01 but it is so simple and seems to work reasonably well. Unfortunately, one current limitation is that the full language names returned are pretty much in English, because the list I’ve used is from the SIL Website and it is primarily in English.

If you try it, please let me know if it does what you’re looking for. I especially want to hear if there are any bugs! I haven’t yet submitted this to the official Picard plugins repository.

Language Name [Download]

This plugin provides a new scripting function $language_name() that allows the user to retrieve the full name for a three-character language code for use within scripts.

Usage

The function is used as $language_name(code), where code is the three-character language code.

It will typically be used to expand the %language% and %_releaselanguage% tags.

For example, $language_name(%language%) will return English if the language code is eng, or GwichÊ¼in if the language code is gwi, or Klingon if the code is tlh. (Yes, Klingon is actually officially recognized in the ISO 639-3 list of languages.)

If an unknown code (or no code) is entered, the function will return an empty string.

Technical Notes

This function uses the ISO 639-3 (Part 2B) language codes, provided by the www.iso639-3.sil.org website.

outsidecontext · February 8, 2019, 12:20pm

If you want to do it per script this is also possible, but you have to list every country. I actually use a similar script to give proper names to some of the custom ~~language~~ country codes MusicBrainz uses:

$noop(Nice names for some special countries)
$if($eq(%releasecountry%,XE),$set(releasecountry,Europe),)
$if($eq(%releasecountry%,XG),$set(releasecountry,DDR),)
$if($eq(%releasecountry%,XU),$set(releasecountry,[Unknown] ),)
$if($eq(%releasecountry%,XW),$set(releasecountry,[Worldwide]),)

I think you get the idea how to extend this to ~~other~~ languages.

rdswift · February 8, 2019, 3:54pm

Thanks to some additional testing by @hiccup, it seems that I should have used the ISO 639-3 (Part 2T) codes rather than the Part 2B codes. I’ll make that change, along with changing the returns to “unknown” and 'missing" for those two conditions (to avoid a Picard crash if you try to reassign an unknown response to the language tag), and issue an update.

hiccup · February 8, 2019, 4:04pm

Thanks @outsidecontext,
So trying to do it by using ‘replace’ probably wasn’t a good idea.
But thanks to @rdswift I won’t have to figure out creating a working script anymore, since the plugin he is creating will be much more efficient and elegant.

Trying this all out raises a new question about something that is currently not completely clear to me, namely if ‘language’ pertains to the language a song (recording) is sang in, or that MusicBrainz is using it as a more general description for releases, or maybe just the titles.
Some more investigating to do…

outsidecontext · February 8, 2019, 4:40pm

MusicBrainz does both. The language you set on the release level indicates only the language of the track listing, not of the lyrics. But if you set the language for a work it indicates the lyrics language.

Picard uses the language on the work for the %language% tag, but it also provides the language on the release as a hidden tag %_releaselanguage%. Picard before version 1.1 actually used the release language for the language tag, which was semantically wrong but in many cases provided the correct data. But it also often was just wrong

hiccup · February 8, 2019, 4:49pm

So theoretically would that mean that if you source from %language%, you should only get a result for songs that have lyrics, and sourcing from %_releaselanguage% would also give results for tracks that just have the title in a specific language?

culinko · February 8, 2019, 5:02pm

I also use similar approach for the script values, however my code looks a bit different than yours:

$set(script,$replace(%script%,Latn,Latin))
$set(script,$replace(%script%,Cyrl,Cyrillic))
$set(script,$replace(%script%,Jpan,Japanese))
$set(script,$replace(%script%,Kore,Korean))

Idk which scripting approach is the “cleaner” one, though.

outsidecontext · February 8, 2019, 6:31pm

Basically yes. But keep in mind that the release language is on the release level, so it occasionally might also apply to tracks with different or no language

Freso · February 9, 2019, 5:52pm

A post was split to a new topic: What ISO standard does MB use for language codes?

rdswift · February 9, 2019, 3:22am

Okay, changes made and new version available. I’ve also got a start on some basic multi-language support.

The language used for the return values will be based on the user’s interface language set in the Picard options. Languages currently supported are English, French, German, Spanish, Dutch, Russian and Chinese. Note that some translations may be incorrect or incomplete, and any help correcting / completing the current translations or translation to other languages would be appreciated. English is the language used for all user interface languages not currently supported.

For example, assuming the user interface is set to english, $language_name(%language%) will return English if the language code is eng, or Gwich´in if the language code is gwi, or Klingon if the code is tlh. (Yes, Klingon is actually officially recognized in the ISO 639-3 list of languages.)

If an unknown code is entered, the function will return unknown. If there is no code entered, missing will be returned.

The update can be downloaded from my GitHub repository along with the source code.

I am considering adding an option in the Picard settings to override the current UI language setting and specify the language used for the plugin’s responses. This is to accommodate the use case where the user wants to tag in a different language than they use for Picard. Is there anyone that would use that feature, or should I not bother with it? Thanks.

hiccup · February 9, 2019, 8:16am

This is great work, thanks again.

A few comments:

If a release has no language information, the plugin will still write a language tag with the content ‘missing’.
Personally I would prefer that in these cases no language tag would be created at all.
There is no information, so I don’t need a tag.

edit:
Giving this some thought, it looks like it can be achieved on the scripting side.
This seems to work well:

$set(language_temp,$language_name(%language%))
$if($in(%language_temp%,missing),,$set(language,%language_temp%))
$delete(language_temp)

If the release language is ‘mul’, the plugin writes ‘unknown’ instead of ‘multiple languages’.

The code zxx is intended to indicate that a recording has no linguistic content. So it’s useful to e.g. mark instrumental tracks.
But the plugin will currently write ‘unknown’, instead of something like ‘no linguistic content’.
In my opinion ‘Instrumental’ might be an even better option, certainly for the purposes of Picard.
At first thought I feared that that might pose problems for songs with yodling or scat, but for those I believe there is the intended code ‘und’. (undefined)
I am not sure how yodling or scat singing is entered and stored in MB’s database though?

rdswift · February 9, 2019, 4:44pm

I think your scripting approach to resolving this is actually the best approach. If I try to set the %language% tag to an empty string, it causes a crash.

Oh yeah. I’ll fix that.

I’ll fix that as well, and have it return “no linguistic content”, which seems to be the official terminology used for the “zxx” code. Somehow both the “mul” and “zxx” codes got dropped when I was preparing the lists.

Thanks for the testing and feedback (and for providing the Dutch translations).

rdswift · February 11, 2019, 9:02pm

The new version (v0.3) is now available for download from the GitHub repository. In addition to updating the language code files, it has changed a bit from the previous versions. You can now override the automatic output language, and manually select the output language file used (independently from the selected UI language for Picard). It currently supports output in English, French, Spanish, German, Dutch, Russian and Chinese. The full User Guide is available for viewing on GitHub.