Suggestion: Automatic translation of ASCII to preferred Unicode

But we couldn’t really easily automatically decide for all texts:

I LOVE YOU (“I am 12” 12" version) → I LOVE YOU (“I am 12” 12″ version)

(open quotes, close quotes, double prime)

Not that easy and prone to mistake (in rare cases but still it would be a pity to auto‐add some mistakes). :slight_smile:

1 Like

Adding a very specific 7″ and 12″ substitution to guess case might be fine. Even if it might insert a few errors like @jesus2099 suggested, that’s less likely than English guess case messing with prepositions vs. adverbs and whatnot and we still have that :slight_smile:

2 Likes

It’s pie-in-the sky stuff, but I’m envirioning something like this:

When entering one of the usual „wrong“ ASCII characters, it is marked with a pastel red background, and a speech bubble opens above it giving the common Unicode alternatives. Clicking one of these will replace the offending character with it. The same can be achieved with a suitable keyboard shortcut (say, Alt-2 to select the second Unicode alternative).

If a replacement is done, the speech bubble and colouration are removed. If the user instead keeps on entering text, ignoring the suggestion, the bubble is also removed, but the red background under the problem characters are kept. One can later pop-up the speech bubble again by mousing over these characters.

Mock-up:

7 Likes

Perfect, can use the same kind of drop-down code that’s used when you type in an artist and you’re prompted to select which one, which also disappears once you’ve selected something else.

While it looks good as a mockup, it’s unfortunately not easily implemented because there is no way to style characters in an <input>. There are [some workarounds] (https://stackoverflow.com/questions/22131214/how-to-highlight-text-inside-an-input-field), but the most promising way (contenteditable) has other disadvantages; e.g., you could no longer do copy and paste on the input field.

3 Likes

Ok but at least could we have an option in the “guess case” to automatically replace apostrophe and hyphen minus by the correct Unicode character, then responsaibility of editors to manually change Hyphen to en or em dash.
This will save a lot of time as they represent the majority of changes, not counting it is impossible to see visually if Hypeh minus was changed to “real” hyphen.

Thanks

1 Like

The technical reason I have seen given before is you can’t spot the difference between ‘quotes’ and apostrophe’s. So instead there are just an army Correction Hamsters running around fixing things.

1 Like

Hm but you can detect them, isnt it?
So we could have an option to force the change of character no matter it was a quote or an apostrophe?

2 Likes

I like that example. Basically automation would be rubbish, and potentially lead to more errors. Leaving it to the Unicode Hamsters seems sensible to me. Go beyond an apostrophe and I am lost. I didn’t even know that close quotes and 12" are different characters! (And don’t get me started on dashes :crazy_face:)

1 Like

Based on my edits from last months it s basically 95% changing apostrophe and minus hypen.
It ends up basically spending 5 min on each release with CTRL + F for hypen then copy paste then same for apostrophe. After track names are normally reviewed for captialization and other characters (en dash,…).
Having a button will reduce time and upon that will allow brain/eyes to be focus on more complex topics.

If you get lost you could rely on User:Jacobbrett/English Punctuation Guide - MusicBrainz Wiki
Personnaly I copied it in a txt file that I keep open while editing then I just need to copy/paste the required one when needed. You can also add all the accents and other generic comments sentence. ex:
É
é
È
è
Ê
ê

œ

part of “xxxx” DJ‐mix

Regards

2 Likes

I use super turbo search and replace, it can even use regex to manage open/close punctuation and it makes changed tracks in yellow for error review before edit submit.
But it saves only one search.

I will see if/how I can enhance it with presets.
And if I can, and after some weeks/months of use, if it’s possible, I will see if it is appropriate for me to patch the Guess Case button, internally…

It sounds great and everything, but I am not fast, so don’t expect too much.

It’s just that I have the same need and I am often slowed down by my search/replace, it could be improved, as I said.


But

But the problem, as @IvanDobsky said, is that you can never know where are the double quotes and double primes (even with even amounts):

“I am 12” 7" edit of 12" remix

The other problem is between single quotes and apostrophes (even with even amounts):

2’59" hardcore 'Master’s Crown Take ‘Em to the Limit’ 80’s 7" edit

Closing quote and apostrophe are the same Unicode character, but not minute (prime)… It eems tough too…

:thinking::exploding_head:

3 Likes

Maybe you will like that userscript:

4 Likes

I don’t think it’s wise to have it as a standard option in the guess case button, because it’s so difficult to get it right with all of the edge cases and that’s just going to lead to incorrect entries all over the place. I’m not opposed to having some level of automation, but it shouldn’t just be done whenever that button is pressed, because I think a lot of people won’t check whether it’s actually correct. Right now it’s reasonably easy to spot ASCII apostrophes, which makes it easier to fix it, but if all of that is converted automatically, who’s going to spot the mistake?

I think it’s impossible to get right. When is it a left single quote (‘) rather than an apostrophe (’)? When it’s at the start of a word? No: ’em. When it’s at the start of a word and there’s a matching ASCII apostrophe at the end of the same word or a later one? No: ’n’. You could perhaps hardcode a couple of cases that’ll be correct 99.99999…% of the time (I’ll, he’s, won’t, etc.) but that’d be half-assing it.

2 Likes

@kellnerd
Thanks will have a try

@ROpdebee
It could be added as an unticked by default tickbox like the roman numbers?
Some code could also detect if it s directly after a number for the prime and double prime
Regarding left single quote it is always after a space normally so could be detected also, no?

Doing it manually also creates risk of issues and no matter editors are supposed to check before validating. There is something to do as it takes time for a non added value task and random editors never look at it.

Yes, and I’m not opposed to that, or a userscript like @jesus2099 suggested. I’d only have issues with it if it was guess case’s default behaviour.

Fully agree. But I like to believe that the editors who use Unicode punctuation currently are also aware of the differences between the various types of quotes/primes/etc, and actually care about using correct punctuation. I’m afraid that by having guess case convert it automatically, the Unicode characters will be inserted by people who don’t care about it, and don’t bother checking it for correctness. I’d much rather them leaving it as is, than making it incorrect.

Again, fully agree. But some editors (especially beginners) still capitalise titles incorrectly, put featured artists in the title, or format ETI incorrectly.

The “good” part about that is that incorrect capitalisation, featured artists in title, or incorrect ETI, is relatively easy to spot and are thus more likely to get fixed. The same thing goes for ASCII apostrophes (and the other characters too, although to a lesser extent). But if all titles would use Unicode punctuation by default, it’d be harder to spot the mistakes. I would probably just glance over it.

Well…

(I don’t think that’s actually correct though, right?)
You’re probably better off training a machine learning model than to try and code in all of the possibilities, it’ll have higher accuracy. Or having a non-default (see above) option to make a best-effort guess based on a small number of rules that work in most cases, and relying on an actual human to check for correctness and fix any mistakes. The only one that I think could reasonably be integrated into guess case by default is the horizontal ellipsis, I can’t think of any cases where ... should not be …, except when an artist is intentionally trying to mess with MB editors.

2 Likes

It is not correct, indeed.
It’s Discourse (forum) automatic prettify, I didn’t paste special characters (because Discourse would break them anyway). :slight_smile:

But then if it is ...., it should become …., .… or just ? :wink:

1 Like

We agree there will be always strange cases but dont forget 95% of the time it is just 2 or 3 normal apostrophes then multiply those 2 mins by the number of editors who take care of punctuations + the number of the releases and there s a good way to save time.

Based on upper there is a consencus about doint it under a script in order this feature is limited to experienced user so @jesus2099 cant wait to beta test your code :slight_smile:

Then in a later stage using this script will provide info to try to implement as part of the guess case. In worst case we could just imagine a system that update easy examples but not the track ttiles with more than 2 apostrophes inside.

1 Like

I think the example that Discourse’s attempt at Prettifying the text shows the problem rather well.

Seriously, I don’t have a clue as to which dash to use. I am here for the music, not the typography. Having a Guess Case button that makes inconsistent errors sounds bad to me as it will cause more work that is harder to spot.

6 Likes

I completely agree with IvanDobsky !!!

2 Likes

@ulugabi and others: I have written a bookmarklet to automate Unicode replacements and created a new topic for those who have interest and/or want to give feedback:


You need to think of a new automation-breaking title example now that my bookmarklet manages to guess the correct Unicode replacements for this one :stuck_out_tongue_winking_eye:
… just kidding, of course there is always something that will not be handled correctly and requires human attention and manual correction.

7 Likes