Suggestion: Automatic translation of ASCII to preferred Unicode

I’ve just been voted down on an edit for replacing Unicode quotes with ASCII quotes:

http://musicbrainz.org/edit/41018888

I was pointed to http://musicbrainz.org/doc/Style/Miscellaneous , which says ASCII is allowed, but Unicode is preferred.

I was correcting titles that were substantially incorrect, but which included three Unicode quote characters. I replaced them with sleeve-accurate titles, but included three ASCII quotes. (Technically, I don’t see where it’s forbidden to replace Unicode with ASCII as I was told, but if I were reviewing the edit I would also have had doubts about allowing it after reading the guidelines.)

While I am fine with re-doing the edit, it struck me that this is a detail that could be corrected by the web site, either automatically or with a button similar to the “suggested capitalization” buttons.

1 Like

The spirit of the guideline is “we shouldn’t force people to enter Unicode because it’s harder, but if it’s already Unicode then it should always be kept that way, since it’s preferred”.

It’s definitely not as easy as that sounds - different languages do things like quotation marks completely differently, and even for English, things like a hyphen or an apostrophe can be multiple different symbols in Unicode.

3 Likes

I think the ‘no’ vote was a bit harsh, but since a voter/editor would have to re-add/fix information (eg it’s replacing more correct information with less correct information, if even slightly so) it’s not really out of line I don’t think.

But annoying in any case.
If it makes you feel better, I almost certainly would make the same mistake, I don’t use ASCII because it’s “yet another thing to do”.

Does anybody have a userscript or something that helps out?

1 Like

Well, sure, a hyphen can be multiple symbols in Unicode, but if the Musicbrainz guidelines prefer a certain symbol, then there’s only one preferred choice, right? And a button would allow the user to override the suggestions.

But I was referring mostly to the quote symbol. It pops up so often for me since I deal with mostly dance music. I don’t know how common the other symbols are but I am constantly typing 12" and 7".

(Also, this made me laugh on the English Punctuation Guide: “Usage of preferred characters is optional and sometimes unwarranted.” :slight_smile: )

It is annoying, but like I said, I would have probably done the same thing if the roles were reversed based on the guidelines. shrug

For each instance yes, but in general: No. Sometimes an ASCII hyphen-minus should be an em dash, sometimes an en dash, and sometimes a minus sign. Sometimes double-quote should be right, sometimes left, and sometimes double-prime. And so on.

3 Likes

Hyphen-minus

En dash for small separate, em dash for large.
Hyphen for hyphenated words.
Minus sign for maths.

Typewriter double quotes

Left, right, top or bottom curly double quotes, depending on position (start, end) and language.
Double prime for inches.

Typewriter apostrophe

Right curly quote for apostrophes but left then right curly single quote for when they are used in English similarly to double quotes.

Various

Multiplication sign for maths and for cross collaborations. Instead of letter x.
Ellipsis instead of three dots.

Automatic

Then the only possible automatic thing would be unicode (many) to ascii (one), not the opposite. :smiling_face:

4 Likes

I’m pretty sure I’m being punked.

http://musicbrainz.org/edit/41058380

Come on, folks. I know what Unicode is. I know there’s a bunch of fancy characters that are helpful for all kinds of different situations and languages. I’m not saying make some multi-level fancy filter for all Unicode translations. I’m saying consider one specific common instance to make it easier for people trying to help populate common data on the site.

That’s the only character I care about. It’s the most common character that one would use, normally, when dealing with seven inch or twelve inch singles/mixes. I know it would only affect a certain amount of users, namely people entering pop and dance music data, but then so do a lot of the controls on the site. And it was only a suggestion to make the site friendlier to less patient and/or less knowledgable people.

1 Like

I think having a toolbox with common Unicode characters in each editing window would be useful. Click the appropriate character and it’s inserted at the cursor position. Might even be possible with a user script.

3 Likes

But we couldn’t really easily automatically decide for all texts:

I LOVE YOU (“I am 12” 12" version) → I LOVE YOU (“I am 12” 12″ version)

(open quotes, close quotes, double prime)

Not that easy and prone to mistake (in rare cases but still it would be a pity to auto‐add some mistakes). :slight_smile:

1 Like

Adding a very specific 7″ and 12″ substitution to guess case might be fine. Even if it might insert a few errors like @jesus2099 suggested, that’s less likely than English guess case messing with prepositions vs. adverbs and whatnot and we still have that :slight_smile:

2 Likes

It’s pie-in-the sky stuff, but I’m envirioning something like this:

When entering one of the usual „wrong“ ASCII characters, it is marked with a pastel red background, and a speech bubble opens above it giving the common Unicode alternatives. Clicking one of these will replace the offending character with it. The same can be achieved with a suitable keyboard shortcut (say, Alt-2 to select the second Unicode alternative).

If a replacement is done, the speech bubble and colouration are removed. If the user instead keeps on entering text, ignoring the suggestion, the bubble is also removed, but the red background under the problem characters are kept. One can later pop-up the speech bubble again by mousing over these characters.

Mock-up:

7 Likes

Perfect, can use the same kind of drop-down code that’s used when you type in an artist and you’re prompted to select which one, which also disappears once you’ve selected something else.

While it looks good as a mockup, it’s unfortunately not easily implemented because there is no way to style characters in an <input>. There are [some workarounds] (https://stackoverflow.com/questions/22131214/how-to-highlight-text-inside-an-input-field), but the most promising way (contenteditable) has other disadvantages; e.g., you could no longer do copy and paste on the input field.

3 Likes

Ok but at least could we have an option in the “guess case” to automatically replace apostrophe and hyphen minus by the correct Unicode character, then responsaibility of editors to manually change Hyphen to en or em dash.
This will save a lot of time as they represent the majority of changes, not counting it is impossible to see visually if Hypeh minus was changed to “real” hyphen.

Thanks

1 Like

The technical reason I have seen given before is you can’t spot the difference between ‘quotes’ and apostrophe’s. So instead there are just an army Correction Hamsters running around fixing things.

1 Like

Hm but you can detect them, isnt it?
So we could have an option to force the change of character no matter it was a quote or an apostrophe?

2 Likes

I like that example. Basically automation would be rubbish, and potentially lead to more errors. Leaving it to the Unicode Hamsters seems sensible to me. Go beyond an apostrophe and I am lost. I didn’t even know that close quotes and 12" are different characters! (And don’t get me started on dashes :crazy_face:)

1 Like

Based on my edits from last months it s basically 95% changing apostrophe and minus hypen.
It ends up basically spending 5 min on each release with CTRL + F for hypen then copy paste then same for apostrophe. After track names are normally reviewed for captialization and other characters (en dash,…).
Having a button will reduce time and upon that will allow brain/eyes to be focus on more complex topics.

If you get lost you could rely on User:Jacobbrett/English Punctuation Guide - MusicBrainz Wiki
Personnaly I copied it in a txt file that I keep open while editing then I just need to copy/paste the required one when needed. You can also add all the accents and other generic comments sentence. ex:
É
é
È
è
Ê
ê

œ

part of “xxxx” DJ‐mix

Regards

2 Likes