Suggestion: Automatic translation of ASCII to preferred Unicode

I donā€™t think itā€™s wise to have it as a standard option in the guess case button, because itā€™s so difficult to get it right with all of the edge cases and thatā€™s just going to lead to incorrect entries all over the place. Iā€™m not opposed to having some level of automation, but it shouldnā€™t just be done whenever that button is pressed, because I think a lot of people wonā€™t check whether itā€™s actually correct. Right now itā€™s reasonably easy to spot ASCII apostrophes, which makes it easier to fix it, but if all of that is converted automatically, whoā€™s going to spot the mistake?

I think itā€™s impossible to get right. When is it a left single quote (ā€˜) rather than an apostrophe (ā€™)? When itā€™s at the start of a word? No: ā€™em. When itā€™s at the start of a word and thereā€™s a matching ASCII apostrophe at the end of the same word or a later one? No: ā€™nā€™. You could perhaps hardcode a couple of cases thatā€™ll be correct 99.99999ā€¦% of the time (Iā€™ll, heā€™s, wonā€™t, etc.) but thatā€™d be half-assing it.

2 Likes

@kellnerd
Thanks will have a try

@ROpdebee
It could be added as an unticked by default tickbox like the roman numbers?
Some code could also detect if it s directly after a number for the prime and double prime
Regarding left single quote it is always after a space normally so could be detected also, no?

Doing it manually also creates risk of issues and no matter editors are supposed to check before validating. There is something to do as it takes time for a non added value task and random editors never look at it.

Yes, and Iā€™m not opposed to that, or a userscript like @jesus2099 suggested. Iā€™d only have issues with it if it was guess caseā€™s default behaviour.

Fully agree. But I like to believe that the editors who use Unicode punctuation currently are also aware of the differences between the various types of quotes/primes/etc, and actually care about using correct punctuation. Iā€™m afraid that by having guess case convert it automatically, the Unicode characters will be inserted by people who donā€™t care about it, and donā€™t bother checking it for correctness. Iā€™d much rather them leaving it as is, than making it incorrect.

Again, fully agree. But some editors (especially beginners) still capitalise titles incorrectly, put featured artists in the title, or format ETI incorrectly.

The ā€œgoodā€ part about that is that incorrect capitalisation, featured artists in title, or incorrect ETI, is relatively easy to spot and are thus more likely to get fixed. The same thing goes for ASCII apostrophes (and the other characters too, although to a lesser extent). But if all titles would use Unicode punctuation by default, itā€™d be harder to spot the mistakes. I would probably just glance over it.

Wellā€¦

(I donā€™t think thatā€™s actually correct though, right?)
Youā€™re probably better off training a machine learning model than to try and code in all of the possibilities, itā€™ll have higher accuracy. Or having a non-default (see above) option to make a best-effort guess based on a small number of rules that work in most cases, and relying on an actual human to check for correctness and fix any mistakes. The only one that I think could reasonably be integrated into guess case by default is the horizontal ellipsis, I canā€™t think of any cases where ... should not be ā€¦, except when an artist is intentionally trying to mess with MB editors.

2 Likes

It is not correct, indeed.
Itā€™s Discourse (forum) automatic prettify, I didnā€™t paste special characters (because Discourse would break them anyway). :slight_smile:

But then if it is ...., it should become ā€¦., .ā€¦ or just ā€¦? :wink:

1 Like

We agree there will be always strange cases but dont forget 95% of the time it is just 2 or 3 normal apostrophes then multiply those 2 mins by the number of editors who take care of punctuations + the number of the releases and there s a good way to save time.

Based on upper there is a consencus about doint it under a script in order this feature is limited to experienced user so @jesus2099 cant wait to beta test your code :slight_smile:

Then in a later stage using this script will provide info to try to implement as part of the guess case. In worst case we could just imagine a system that update easy examples but not the track ttiles with more than 2 apostrophes inside.

1 Like

I think the example that Discourseā€™s attempt at Prettifying the text shows the problem rather well.

Seriously, I donā€™t have a clue as to which dash to use. I am here for the music, not the typography. Having a Guess Case button that makes inconsistent errors sounds bad to me as it will cause more work that is harder to spot.

6 Likes

I completely agree with IvanDobsky !!!

2 Likes

@ulugabi and others: I have written a bookmarklet to automate Unicode replacements and created a new topic for those who have interest and/or want to give feedback:


You need to think of a new automation-breaking title example now that my bookmarklet manages to guess the correct Unicode replacements for this one :stuck_out_tongue_winking_eye:
ā€¦ just kidding, of course there is always something that will not be handled correctly and requires human attention and manual correction.

7 Likes

Really?! :open_mouth: Wow!!! :heart_eyes::+1:
:bowing_man:

2 Likes

Just tried on few releases and that s working quiet well, I specially like the yellow hightlightning to help review the automatic changes

2 Likes

But 80s should not have an apostrophe in it (or am I starting another separate debate :grin: )

1 Like

That depends on the variant of English youā€™re speaking, like 80s for Brits and 80ā€™s for yanks.

4 Likes

I would love for a version of this to be included in ā€˜guess caseā€™.

I donā€™t really care if it means that some people get it wrong 1% of the time - if thatā€™s bad, then surely having the other 99% wrong all of the time is worse. Or am I missing something?

I think itā€™s less of a problem if people who donā€™t understand it and donā€™t care just leave the straight apostrophes and their ilk in. If theyā€™re not touching it, they canā€™t do it wrong either. If itā€™s part of the guess case button, they would use it and not know if it guessed wrong or how to fix it. Maybe a separate ā€˜guess punctuationā€™ button would be a good compromise.

4 Likes

This bookmarklet could be turned into a userscript Guess Punctuation button, next to the MBS Guess Case button.

3 Likes

Something like this?

Disclaimer: Very quick and dirty port, only works on tracklists, the code probably sucks because itā€™s been nearly 10 years since I used JS on a regular basis.

4 Likes

Not familiar with the UI editing rules so would just say should it be a button or another tickbox?

Thanks for the following on this topic

I translate this as:
Curly where a straight should be: very bad!
Straight where a curly should be: not very bad

Is that right? Because otherwise not touching it (all straight) is just as wrong as using the occasional incorrect curly. More so, because there are more situations where it should be curly.

1 Like

No. Straight where curly should be is OK, the right curly instead of straight is good. A curly apostrophe where there should be a prime is bad. Accidentally turning the correct typographical symbol into the wrong one because guess case gets it wrong and the editor doesnā€™t have a clue is very bad.

3 Likes

I have now been using Kellnerdā€™s ā€˜Guess Punctuationā€™ button for a while now, and am finding it invaluable:
image
Link to thread + plugin

The immediate highlighting of the tracks changed makes it particularly easy and quick to check if anything has been changed incorrectly.

Iā€™ve opened a ticket about adding this, or similar, function to MB:

If youā€™re interested please try out @kellnerds script for a bit and then vote or comment for or against on the ticket :+1:

7 Likes