Unicode apostrophe standardization

It is trivial to turn typographically correct characters into their ASCII equivalents, but the other way around is in some cases impossible (the software would have to choose between “ and ” for example). That is why we prefer to store the former. You could take a look at the plugin Non-ASCII Equivalents (which is similar to the built-in option) and try to remove bits of code you don’t want (like Japanese to Latin). You can download that plugin here: https://picard.musicbrainz.org/plugins/

3 Likes

@mfmeulenbelt I am trying to make sense of the Picard plugins and options, but there is so little written down about them.

I have seen options for “Convert Unicode Punctuation characters to ASCII” in the options that seems the best fit - but no documention on what it swaps.

I see the addon for the NON-ASCII Equivalents - but again find no details on what it swaps. (And I don’t really want to have to start editing source code unless I really really have to)

I am pretty sure it is just that first one I need to get right. I want to see my multilingual characters as I have Turkish and Japanese artists in my collection. My modern software in the modern media centre happily displays those okay.

It is just when getting to the level of filenames it gets awkward for me.

I have already realised that part of my problem is also coming from EAC when ripping. Because it now points at Musicbrainz instead of the FreeDB I am getting these oddities in my filenames, which is a bigger headache when mixed in with previous ripped folders and files.

TBH - the main scream from my has been because I have only spotted this now after re-ripping 350+ disks in the past few months. It now means I need to go back through everything and normalise things for my system. HAHAHA - this really is a never ending mission. This was the fourth time ripping my music collection… at least cleaning the tags with Picard should be quicker this time.

In the New Year I’ll make sure I have a clearer picture of this mess that has now happened in my files. And then write it up in some way for other people who will also come across it.

I would suggest you try a few different options with just one troublesome release and see if it comes out right. The source code for that plugin isn’t very complex. I think it will speak for itself if you open the .py file in notepad (you won’t need any programming knowledge).

I’ll take a calm clean look at this all again in the new year. I know I have specific requirements that don’t fit other people’s needs. I only need the punctuation cleaned up - mainly because I can’t SEE the difference between one hyphen dash and another, but my computer file system can see it. Leading to confuddlement.

I will look at the source code deeper - not a problem with the comprehension of it as I have written Python addons for elsewhere. (C \ C++ background). The initial look into that plugin shows it is far too wide for what I want. It is removing far too many characters for me. I still want to see Ayşedeniz Gökçin and Björk but I also want to know that Easy Star All-Stars and Ed Alleyne-Johnson are not getting confused with Easy Star All‐Stars and Ed Alleyne‐Johnson!

I think my bigger issue is going to be with EAC ripping using MusicBrainz metadata as I need to manually add a list of substitutions into that program now to catch these dashes and oddities that a small number of people are entering into the database, even though the official line is clearly that they are optional.

They are optional, but correct punctuation etc is preferred. That small number of people is improving the database.
That it makes tagging life difficult for you is unfortunate, but not a good reason to water down how we store data (MB aims to be a database first, a resource for Picard to use/for you to tag your files second).

Although I’m not sure what the exact issues are? I haven’t really heard of KODI or playlists having trouble reading or displaying MB tracks.

I can see this would be confuddling. Unfortunately the two hyphens used on Musicbrainz (Unicode HYPHEN and HYPHEN-MINUS) are supposed to look identical :man_facepalming:

Seconded.
Without any further explanation (or any knowledge about ASCII or unicode) this sentence:

Use of basic ASCII punctuation characters such as ’ and " is allowed, but typographically-correct punctuation is preferred.

sounds to me like 'em is allowed, but them is preferred.

I for one always use the apostrophe that I can use without hurting my fingers.
To write ' I only need to hit one key once, to write ´ I have to hit Alt Gr + ' twice.
At least I didn’t find any other option. This is my keyboard layout:

I’m using German Switzerland, because that’s where my computer is from and I’d like the signs on the keys to actually represent the output, but if e.g. I switched to English layout:

the correct apostrophe seems to be gone completely.

Do you guys all use copy paste to write apostrophes or do you have different keyboard layouts?

I tried to find a guide to create a custom layout on ubuntu, but that’s way over my head.

That’s a nice policy, but doesn’t always work well.
E.g.: I have recently made about 1000 edits where I moved the “feat. XY” from the title to the artist credits and then I used Guess case, Reuse previous recordings and Copy all … to associated recordings(*) and then assumed that if a recordings title was changed from e.g. “Can´t touch this” to “Can’t touch this” that this was because Guess case switched it to the correct punctuation, when in fact the titles were just different in the album and in the recording to begin with and Guess case did nothing.
I probably changed a lot of recordings incorrectly lately until I finally got called out here.

(*) I changed obvious mistakes of the Guess case function back (like I'm from BK vs I'm from Bk) and unticked Copy all … to associated recordings if the track was e.g. called “Song (album version)” on the single and “Song” as a recording.

3 Likes

What I’d do on Ubuntu is:

  1. Set one of the keys on your keyboard as the ‘compose key’ from system settings (I use right-Ctrl but it’s up to you).
  2. Use the table here to find the key combinations you need (they’re pretty logical and you’ll soon learn the most common ones). Just hit the compose key followed by the two-or-three-key sequence from that table.
4 Likes

The English have little use for special characters. :slight_smile: I have a German T2 keyboard (although I have switched the Z and Y back to their proper places), so ’ is a simple Alt-Gr + 1.

I don’t often see correct punctuation being turned back into their ascii equivalents, so it doesn’t go wrong that often. But do be careful with Copy all … to recordings, it can introduce errors. It would be nice if you could see if recordings are shared among multiple tracks in the recordings tab of the release editor.

I also use EAC to rip. If you find a solution to the above, I would like to see your resolution!

Ok, now I’m completely confused. I tried to find a way to change the compose key, but that setting doesn’t seem to exist in Ubuntu 17.10. Anyways a websearch suggested that Shift+Alt Gr is the compose key by default and with that I found two additional apostrophes. So there are 5! ´‘’’` So which one is the correct one?

The official recommendation is to use right-single-quotation-mark for apostrophe and this is what most people do. (I think this was a mistake, but that’s another story.)

1 Like

@Llama_lover EAC can do character swaps in the settings for the filenames. It already swaps out the obvious ones that upset file systems. So I have tagged a few more to that list.

From the EAC menu select EAC Options. Now look for the Character Replacements tab.

There are already swaps in here for slashes, colons, question marks. And a few empty boxes at the end. I have added my hypen and apostrophe swaps in here to swap from these prettified ones to the standard ASCII ones.

I see the idea that is being attempted, but for my filenames I need to change them to avoid confusion.

I do still want to see every umlaut and Japanese character correctly. It is just the “hard to see by eye” items that I like to swap back in my filenames. (Happy to leave them in my tags)

FWIW, I have a normal French AZERTY keyboard (missing lots of French characters) but I have many changes made to it by my crappy AutoHotKey permanent script, with which this U+2019 apostrophe replaces the typewriter apostrophe, so it’s a single key stroke (I use SHIFT+ if I want the typewriter apostrophe).

But I am waiting for nice keyboards once the new French BÉPO norm will apply (in 2018 or 2019, I don’t remember), I hope more manufacturers will provide one, and they have that single key stroke for apostrophe, and all other useful characters.

OMFG
I followed your link, there is talk about two different apostrophes (U+02BC and U+2019).
U+2019 is the same as Shift + Alt Gr + B on my keyboard. U+02BC is not even one of the 5 different apostrophes I found so far on my keyboard. :roll_eyes:
I get that different regions with different languages developed different signs for similar things in the past, but why do we still have 6 (or more?) versions left today?
It’s not like anyone can tell the difference between ʼ and without a magnifying glass anyway.

Anyways, Shift + Alt Gr + B “right-single-quotation-mark” = U+2019 = should be the “correct” one (for now) right?

Typography is complicated :slight_smile:

Yes. (But you can get the other one on Ubuntu by typing Shift-Ctrl-U 0 2 B C Return.)

2 Likes

Oh it’s that easy, huh? :rofl:

3 Likes

Fortunately you should almost never need to use the U+02BC.

“used as a tone marker in Bodo, Dogri, and Maithili; U+2019 is the preferred character for a punctuation apostrophe”

@IvanDobsky That does help me!! I’m messing with it now. Thank you.

Glad to help. This one first caught me out when I had two copies of the same album. One I had tagged years ago using EAC and freedb.org and then when I re-ripped it I tagged it with EAC and MusicBrainz. It caused quite a puzzle when I saw two folders side by side with the exact same name of “Easy Star All-Stars”. :smiley:

Only problem with that EAC settings page is I don’t think there are really enough boxes for all possible substitutions.

Meanwhile - the question to the rest of the thread stays open. How do I get these characters on a Windows UK Keyboard? Even it is the ALT+[NumPad] combination. I find it puzzling that everyone is either using Linux or copy and pasting.