Is $title broken or am I misunderstanding it?

shamboni · April 26, 2021, 1:29am

I’m formatting an album with ALLCAPS titles and this script doesn’t work:

$set(title,$title(%title%))

The titles are UNCHANGED. However this script does work:

$set(title,$lower(%title%))
$set(title,$title(%title%))

Now They Look Like This. So unless I’ve missed something about how scripting works (very possible, I’ve been doing it for one whole weekend) it seems $title is only doing half of its job?

elomatreb · April 26, 2021, 4:30am

This is just how the title case function is implemented: https://github.com/metabrainz/picard/blob/master/picard/script/functions.py#L1024, it only ever touches the first letter of a word. This does have the advantage of preserving things like acronyms being uppercased.

If you want the change the entire word, $set(title, $title($lower(..)) is not a bad way.

Sophist · April 26, 2021, 9:35am

Yes - this is how it is implemented, but IMO this is a bug.

The description clearly states that it should return title text i.e. mixed case and not just capitalise the first letter of a word.

I wasn’t sure why it does not just use the Python3 string title() method (which handles Unicode) until I tested it with “shamboni’s”.

I am also unsure exactly whether the iswbound function is correct or not - but at present I am guessing not. You shouldn’t normally get e.g. paragraph boundary strings in a Picard variable (but it could e.g. be in lyrics), but if you do they should be treated as a word boundary. But iswbound checks for the Unicode modifier symbol and treats it as a word boundary which IMO it shouldn’t - and in fact I wonder how modifier symbols should be treated and indeed whether the character-by-character algorithm will work properly with unicode when a single unicode character can actually be several individual characters with modifiers.

If I was coding this from scratch, I would use the Python3 title method, and then use a regex replacement to find locations with a quotation preceded by a unicode-letter and followed by a unicode upper-case or Title-case (digraphic with first part uppercase) letter and replace the latter with a lower-case equivalent.

Picard does have some unit test functions for $title, but clearly they are not good enough. IMO Picard also needs a slew of obscure unit tests for string functions like $title to check they are handling unicode correctly for all languages and for obscure unicode functionality like digraphics and modifiers.

Apologies, but I don’t have time to fix this at present.

outsidecontext · April 26, 2021, 10:58am

I don’t think it is a bug, this function was intentionally implemented to preserve all upper case words, see also https://github.com/metabrainz/picard/commit/b677d094d7cec0e86727b281a14ea27c4bd9d635#diff-2abc6dc91bfdf6ba1d7a0af3db0fc182ac3438ec3a76ac1a0778a5046071be45R34

Also I doubt it would be a good idea to change it after it has been working this way for over 10 years. I doubt people would be happy with their titles changing to “Live At The Bbc” or “Ac/Dc Live Usa”

But tests for this case are definitely missing.

Sophist · April 26, 2021, 11:44am

Philipp - I do understand the issues of backwards compatibility, and your examples show why, but:

A. The function description says letters other than first in words should be lower case;
B. The accepted usage of the function in all other software is letters other than first in words should be lower case;
C. There are IMO likely to be some unicode issues with the current implementation as well;
D. The current implementation is not written in efficient python (though this is not serious enough to warrant a change in its own right).

Could we change existing $title to $upper1st (and have an upgrade function to change all existing scripts) and reimplement $title properly?

elomatreb · April 26, 2021, 11:50am

An alternative would be to introduce a title-case function that works as you expect as something like $strictTitle. Given the backwards compatibility problem of causing a change in tags for people who potentially have used this for a decade I don’t think changing it is appropriate.

This feels like a very bold claim. Given the origin of the concept of title casing, I don’t think any sane newspaper editor would write a headline about the “Bbc”.

Sophist · April 26, 2021, 12:37pm

I said software implementation - not how title case is used for abbreviations.

As for $stricttitle Picard has the ability to update existing scripts to change $title to $upper1st when you install a release with this change - so existing scripts would continue to work the same, and only new scripts containing $title would work differently.

outsidecontext · April 26, 2021, 1:53pm

I just fail to see how $title(some band at the BBC) → Some Band At The Bbc would be better than the current behavior. IMHO the current behavior is more useful then a simple implementation as suggested. Of course such a function will always be limited, and the current implementation is still very simple and does not try to fully support grammatical rules or such (as e.g. MB’s guess case functionality does). But as a generic function the keeping uppercased words as is seems rather useful to me.

If you don’t want that it’s easy to do a $lower($title(...)).

I would rather update the documentation to clarify how this function handles uppercase words.

I’m sure both can be addressed without changing the overall behavior.

outsidecontext · April 26, 2021, 2:37pm

What is really bad is that the example at https://picard-docs.musicbrainz.org/en/functions/func_title.html does not actually work the way it is documented.

Otherwise the function currently strictly does what the docs say: “first character in every word capitalized”.

The original implementation of this function also explicitly mentions the intention to not lowercase completely uppercased titles (e.g. as is common on many Japanese releases), so it’s not only limited to acronyms as in my examples.

Sophist · April 26, 2021, 3:05pm

Philipp - I guess we need to agree to differm because whilst I agree that there is a place for the current functionality (so that it doesn’t lower case acronyms that are all upper case) I disagree that such a function should be called $title because it doesn’t do what the name suggests it does.

Which is why I would like to see the current function changed to a different name, and a genuine title-case function implemented.

P.S. I think that the whole current functionality should be possible with a regex which would be an order of magnitude or more faster than the current code.

P.P.S. I think you mean $title($lower(...)) - which I agree is equivalent to what people think a title-case function should create, but I think that $title(...) = $upper1st($lower(...)).

rdswift · April 26, 2021, 3:37pm

I’ll update the documentation right away, and add some more examples to make it a bit clearer.

shamboni · April 26, 2021, 5:15pm

Yes, that example is what made me think this was a bug.